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Foreword 



IFIP TC7 Conferences on System Modelling and Optimization are 
held every two years. Their subjects cover a wide range of methods, 
theory and applications. The present volume contains most of the invited 
papers and a few of the submitted ones that were presented at the 19th 
Conference, held in Cambridge, England from July 12th to 16th, 1999. 

The meeting was attended by about 210 participants from 37 coun- 
tries. 10 invited and 143 submitted talks were presented. These proceed- 
ings, however, include only 14 of the papers, in order that each article 
can be long enough to provide a coherent and substantial contribution 
to knowledge. Therefore all of the work that was submitted for possible 
publication was judged by unusually high standards. The editors are 
very grateful to the referees who helped in this way. A list of the other 
139 papers that do not appear now is also given. Most of them were not 
offered to the proceedings, because they addressed work in progress or 
will be published elsewhere. 

The official co-sponsors of the conference were IFIP (International 
Federation for Information Processing) and DAMTP (Department of 
Applied Mathematics and Theoretical Physics, University of Cambridge, 
UK). Their support was crucial and most welcome. Further, the fol- 
lowing companies and institutions made generous donations: Barrodale 
Computing Services (Canada), British Computer Society (UK), Cam- 
bridge University Press (UK), DOT Products (USA), General Motors 
(USA), IBM Unternehmensberatung (Germany), Kluwer Academic Pub- 
lishers (The Netherlands), London Mathematical Society (UK), Nomura 
International (UK), Numerical Algorithms Group (UK), Pareto Partners 
(UK) and Terrasciences (USA). Their sponsorship was highly important 
to the academic excellence of the programme, to the participation from 
a wide range of countries, to the enjoyment of the social activities and 
to the publication of these proceedings. 

We also ask many individuals to accept our thanks. The registra- 
tion desk helpers come to mind immediately, namely Caroline Powell, 
Catherine (and Georgia) Powell, Alice Powell and Ortelia Bejancu. We 
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are grateful for the addresses by Alec Broers (Vice Chancellor of Cam- 
bridge University), David Hartley (President of the British Computer 
Society) and Peter Kali (Chairman of IFIP TC7) at the Opening Session 
and also for the speech by Josef Stoer at the conference dinner. Further, 
the invited speakers gave some brilliant talks and the authors of the 
contributed papers enhanced the quality of the occasion. We received 
very valuable assistance from the University of Cambridge through staff 
on the New Museums Site, in DAMTP and in the Judge Institute of 
Management Studies. We are indebted to the University Centre and to 
Pembroke College, especially Paula Hunt and Ken Smith, for the success 
of the social events. Finally, every word that you read in these proceed- 
ings has been produced in its present form by Hans-Martin Gutmann. 
It is a pleasure to acknowledge all of these contributions. 

M.J.D. Powell Cambridge 

S. Scholtes 
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1. INTRODUCTION 

The field of variable-topology shape design in structural optimiza- 
tion has its origins in theoretical studies of existence of solutions in 
variational problems, in particular shape optimization problems, and in 
studies in theoretical material science of variational bounds on material 
properties. Progress in computational methods and the ever increasing 
computer power has oriented the area towards applications, with sig- 
nificant developments being achieved over the last decade, leading to a 
fairly widespread use of the methodology in industry. 

In this short paper we outline some of the basic ideas and methods 
of existing methods, but it is not our purpose to cover all work and 
approaches in this field. Instead we refer to existing literature containing 
rather comprehensive surveys, see e.g., [7, 8, 31]. Moreover, note that 
reference is mostly made to recent papers that include bibliographies 
useful for on overview of the area. Thus the presentation does not try 
to present a complete historical perspective. 

The area of computational variable-topology shape design of contin- 
uum structures is presently dominated by methods which employ a ma- 
terial distribution approach for a fixed reference domain, in the spirit of 
the so-called Tomogenization method’ for topology design ([3, 9]). That 
is, the geometric representation of a structure is similar to a grey-scale 
rendering of an image, in discrete form corresponding to a raster rep- 
resentation of the geometry on a fixed reference domain. The physics 
of the problem is also represented by boundary conditions and forcing 
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terms defined on this fixed reference domain, much in analogy to ficti- 
tious domain methods for FEM analysis. 

One can normally distinguish between three versions of raster based 
geometry models for continuum topology optimization. The basic prob- 
lem is an unrestricted ’0-1’ integer design problem (generalized shape 
optimization), that is, a design specifies unambiguously whether there 
is solid material or void at every point in a candidate design region. 
Otherwise, there are no restrictions on the shape. Unfortunately, in 
general, this class of problems is ill-posed in the continuum setting (cf., 
[11, 20]). Well-posed problems can be obtained by either extending the 
space of admissible solutions to obtain their relaxed versions, usually by 
incorporating microstructure (see, e.g., [3]), or by restricting the space 
of admissible solutions. The latter can be accomplished by enforcing 
an upper bound on the perimeter of the structure (see [28], and refer- 
ences therein), by imposing constraints on the slopes of the parameters 
defining geometry (see [29], and references therein), by the introduction 
of a filtering function limiting the minimum scale (see [10, 36] for an 
overview), or one can introduce a ground structure with a fixed number 
of design degrees of freedom (like a fixed mesh for design). Here the first 
three restriction methods are well-posed in the setting of a continuum 
description of the design problem, while in the latter case existence relies 
entirely on the finite dimension. 

Relaxation usually yields a set of continuously variable design fields 
to be optimized over a fixed domain, so the algorithmic problems asso- 
ciated with the discrete 0-1 format of the basic problem statement are 
circumvented. The continuum relaxation approach can be very involved 
theoretically (see for example [1, 13]) and much work is still needed in 
this area. 

The restriction approach leads to ‘classical designs’ and there is no 
ambiguity as to the physical modelling (local material response is de- 
termined solely by the presence or absence of the given solid material). 
However, the major challenge is the solution of a large-scale integer pro- 
gramming problem. Due to the high cost of function calls for these 
problems, solving the 0-1 formulation directly, for example by genetic 
algorithms or simulated annealing, is, in general, not viable for very 
large-scale problems, and this is an area that should have more focus in 
the coming years (see also below). Another - and the most commonly 
used - approach is to replace the integer variables with continuous vari- 
ables and then introduce some form of penalty that steers the solution 
to discrete 0-1 values. A key part of these methods is the introduction 
of interpolation functions (often interpreted as material densities) that 
express various physical quantities (e.g., material stiffness, cost, etc.) as 
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a function of the continuous variables. Moreover, geometric properties 
also require suitable interpretation. Although there is a strong resem- 
blance to the relaxed formulations, it is important to recognize that the 
continuous format is then merely part of a computational strategy which 
does not alter the ultimate goal of solving an integer problem, i.e., to 
obtain a black-and-white design. 

2. BASIC PROBLEM STATEMENT 

In continuum topology design we seek the optimal distribution of ma- 
terial in a fixed reference domain in or R®, with the term ‘optimal’ 
being defined through choice of objective and constraint functions. The 
objective and constraint functions involve some kind of physical mod- 
elling that provides a measure of efficiency within the framework of a 
given area of applications, here structural mechanics. Here we thus con- 
sider a mechanical element as a body occupying a domain which 
is part of ft on which applied loads and boundary conditions are de- 
fined (this reference domain is often called the ground-structure). Re- 
ferring to the reference domain ft we can define an example problem 
as a minimization of force times displacement, over admissible designs 
and displacement fields satisfying equilibrium (the minimum compliance 
problem) : 



mm 

C,u 



[ fu dft 

Jn 



subject to: 

/ Cijki{x)eij{u){x)eki{v){x) 

Ju 



dn = 



Q 



ijkl 
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e ^ad 



( 1 ) 



Here the equilibrium equation is written in its weak, variational form, 
with U denoting the space of kinematically admissible displacement 
fields, u the equilibrium displacement, / the forces, and e{u) linearized 
strains. The rigidity tensor Cijki is the design variable of our problem 
and various definitions of the set of admissible rigidity tensors is what 
distinguishes various settings for the design problem. This type of prob- 
lem is what is often labelled as a problem of ‘control in the coefficients’ 
(c.f., [22]), here with the controls entering in the high order part of the 
governing differential operator. 

A classical variant of problem (1) is the so-called variable thickness 
problem, where the set of admissible designs is defined through a thick- 
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Figure 1 The generalized shape design problem of finding the optimal material dis- 
tribution. 



ness function h: 

= heL°-{n), 

f (2) 

I h — V, 0 ^ h — h'maxi 

Jn 

for given material properties and given volume V. This problem 
is rather unique, as one here has existence of solution in the ^naive’ 
setting, i.e., in the formulation which immediately springs to mind when 
modelling the problem (see [27] and references therein). 

For a topology design setting, defined through designs given as do- 
mains of material points, the admissible designs are defined by a 
point-wise volume fraction of a given material, and this density can only 
attain the values zero or one (a black-and-white design), c.f.. Figure 1: 



c,M^) = 0(x)c%,„ 

&{x) — 1, X G Q{x) = 0,x G rj \ 

Yol{n^) = [ G{x) 

Jn 

Here existence of solutions usually require further consideration as to the 
modelling of the problem. Loosely speaking, materials with a structural 
hierarchy allow for stiffer structures, as seen in nature in bone, wood, 
etc. and used in composite structures (see Figure 2). This can lead to 
a lack of existence of solutions, as such composites can be constructed 
as limiting sequences of designs defined by (3); however, composites are 
not covered by (3) (if is isotropic, the material in (3) is at any point 
isotropic, but a composite will usually be anisotropic). One says that 
the set designs is not closed under G-convergence (or H-convergence) 
(cf., e.g., [20, 23]). 

One technique to obtain a well-posed problem, as mentioned in the 
introduction, is to introduce a constraint < 7 on for example 
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Figure 2 A ‘traditional’ design of a beam with holes, as seen in aircraft etc,, compared 
with elements of a minimizing sequence of designs with finer and finer scales, with 
the limit (4) consisting of composites in large areas. Note that even design (1), which 
is an optimal topology within a limit on geometric complexity, is 40% stiffer than the 
design with round holes, for the same amount of material. 



the perimeter of the domain, thus limiting the geometric complexity 
of the domain so that a structural hierarchy cannot be generated 
(existence still requires a formal proof, as seen in [4, 10]). If the geometric 
constraint is not imposed in (1), the continuum problem is ill-posed, 
and an alternative for obtaining existence of solutions is to extend the 
problem to its relaxed form using composites formed by mixing void 
and the given material. This changes the nature of the problem from 
one of seeking black-and-white designs to a situation where ‘grey-scale’ 
structures may also appear. 

Problem (1) with design set (3) (regularized through a geometric con- 
straint) is a large scale discrete optimization problem and it has in this 
form only recently been handled with success computationally ([6]). The 
most popular method for solving (1), and an approach that has been ex- 
tremely successful for many applications, is to consider formulations in 
terms of continuous variables, with the goal of using derivative based 
mathematical programming algorithms. This means that one changes 
the model for the material properties to a situation where the volume 
fraction is allowed to take on any value between zero and one. It may also 
involve finding an appropriate method for limiting geometric complexity, 
for example, exchanging the total variation of a density for the perimeter 
of a domain. This means that one pursues the optimal black-and-white 
design through an iterative computational procedure which at interme- 
diate steps operate with ‘grey’ designs. This can be rather confusing, as 
the computational procedures for the geometrically constrained black- 
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and-white design problem and for the relaxed formulation, cunningly so, 
can be very similar. 

3. A FRAMEWORK FOR COMPUTATIONS 

Computational means are central for solving problems of the type 
(1). Typically a finite dimensional version of the problem is generated 
first, normally with finite element approximations of displacement fields 
and design variables. This leads to a non-linear, non-convex, large scale 
optimization problem of the generic form 

inf p^u 

K{D)u = p (4) 

^ ^ — V, 0 < D^jiin ^ Di ^ D max^ 

where u here is the vector of nodal displacements and D the design vari- 
ables, entering the equilibrium equation through the stiffness matrix K. 
This problem could be solved directly via large-scale general purpose 
mathematical programming techniques or by use of specially developed 
algorthms that utilizes the problem structure, as seen in 4. Such tech- 
niques treat both displacements and design variables as independent 
variables and the equilibrium equation is solved by the optimization 
code (an example of this can be found in [41]). 

Such an approach is not very common, as one traditionally rewrites 
the problem as a problem in the design variables only, as 

inf F{D) 

0 D'uiifi ^ Di ^ DjYidx -) 

where the equilibrium equation is considered as a function-call: 

F(D) = p^u^ 

( 6 ) 

where u solves K{D)u = p. 

If function gradients are required by the optimization algorithm, sen- 
sitivity analysis becomes a central theme, i.e., one should devise efficient 
means to compute the derivatives of functions involving the displace- 
ments. This usually involves the so-called adjoint system, as also known 
from control theory. Here, provided the stiffness matrix is positive defi- 
nite, we have here the simple expression (the computation of the deriva- 
tive of the stiffness matrix with respect to design is simple for topology 
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design) : 

dF _ tSK 

(7) 

where u = {K{D))~^p, 

However, in many cases this analysis is not so simple. Sensitivity analysis 
is actually, in its own right, an important area of computer aided design, 
as it provides information on changes in performance with respect to 
design variables, see for example papers in [18]. 

In many situations it is convenient to operate on (5), as one here treats 
the equilibrium analysis as a function call, as represented by, e.g., com- 
mercial software packages for finite element analysis. Moreover, in many 
design problems, such as shape design via a CAD definition of bound- 
aries, the number of design variables is much smaller than the number 
of degrees of freedom of the displacement, and an optimization code 
used for (5) will thus treat a moderately sized problem (with ^expensive’ 
function calls). However, for topology design, even problem (5) is large- 
scale; but in many cases one can formulate relevant problems involving 
only a moderate number of constraints, which are not box-constraints 
(as in the model problem used here). Thus dual methods have become 
popular, and especially techniques based on separable, convex approxi- 
mations (CONLIN, see [16], and MMA, see [39]) have been found to be 
particularly effective. 

We close this brief section on computational issues by noting that the 
problem at hand is a two field problem which may develop checkerboards 
in computations (as for a Stokes flow problem). Constraints limiting geo- 
metric complexity usually removes such effects (for a fine enough mesh), 
but in a relaxed setting (with existence of solutions, but no geometric 
constraints) special care must be taken to avoid such ‘polluted’ results. 
See [36] for an overview and further discussion. 

4. HOMOGENIZATION MODELS WITH 
ANISOTROPY 

The initial work on numerical methods for topology design of contin- 
uum structures was based on using composite materials as the basis for 
describing varying material properties in space ([9]). This approach was 
inspired by theoretical studies on generalized shape design in conduction 
and torsion problems and by numerical and theoretical work related to 
plate design (see, e.g., [11, 17, 23]). Initially, composites consisting of 
square or rectangular holes in periodically repeated square cells were 
used for planar problems. Later so-called ranked laminates (layers) have 
become popular, both because analytical expressions of their effective 
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properties can be given and because investigations proved the optimal- 
ity of such composites, in the sense of bounds on effective properties (for 
an overview, see for example [2]). Also, with layered materials existence 
of solutions to the minimum compliance problem for both single and 
multiple load cases is obtained, without any need for additional con- 
straints on the design space (e.g., without constraints on the geometric 
complexity), and thus interpolation and relaxation are both provided for. 
However, the relaxed models seldom result in black-and-white designs 
in themselves. For all the models mentioned here, homogenization tech- 
niques for computing effective moduli of materials play a central role. 
Hence the use of the phrase ‘the homogenization method for topology 
design’ for procedures involving this type of modelling. 

The homogenization method for topology design involves working 
with orthotropic or anisotropic materials. This adds to the requirements 
of the finite element analysis code, but the main additional complications 
is the extra design variables required to describe the structure. Thus, a 
so-called rank-2 microstructure with two orthogonal layers of material 
(at two scales and in dimension two) require three distributed variables, 
as the material properties at each point of the structure will depend on 
two size-variables characterizing the layer thicknesses and on one vari- 
able characterizing the angle of rotation of the material axes (the axes 
of the layers). 

5. THE SIMP MODEL 

Complementing the use of the homogenization method, where aniso- 
tropic composites are a priori accepted as part of the design space, a 
popular method to model material properties which are isotropic at in- 
termediate densities is the so-called penalized, proportional fictitious 
material SIMP-model (SIMP: Solid Isotropic Material with Penaliza- 
tion), see for example [8] for an overview. In this model a continuous 

variable p is introduced, with 0 < p < 1, resembling a density of material 

by the fact that the volume of the structure is evaluated as 

P = / p{x) dQ (8) 

Jn 

The relation between this density and the material tensor Cijki in the 
equilibrium analysis is written as 

P>1, (9) 

where the given material has stiffness given by The interpolation 

9 satisfies that Cijki{^) — 0 and Cijkii^) = ^ijki ’ Cleaning that if a 
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Figure 3 Microstructures of material and void realizing the material properties of the 
SIMP model with p = 3 (Poisson’s ratio is 0.33). As stiffer material microstructures 
can be constructed from the given densities, non-structural areas are seen at the cell 
centers. See [8]. 

final design has density zero and one in all points, this design is a black- 
and-white design for which the performance has been evaluated with a 
correct physical model. For problems where the volume constraint is 
active, experience shows that optimization does actually result in such 
designs if one chooses the power p sufficiently big (in order to obtain 
true ^0-1’ designs, p > 3 is usually required). The reason is that for such 
a choice one imposes a penalization on intermediate densities (volume is 
proportional to p but stiffness is less than proportional). 

For the SIMP interpolation 9 it is not immediately apparent that 
areas of grey can be interpreted in physical terms. However, it turns 
out that under fairly simple conditions on p, any stiffness used in the 
SIMP model can be realized as the stiffness of a composite made of 
void and an amount of the base material corresponding to the relevant 
density, see Figure 3. Thus using the term ‘density’ for the interpolation 
function p is quite natural. The geometries in Figure 3 represent periodic 
composites with repetitive cells, obtained through the methodology of 
inverse homogenization (material design) described in [33]. This method 
employs topology design (using SIMP!) for the cell of the microstructure 
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Original design: 



Figure 4 Design of a bracket for an automobile application. Original design and 
design achieved by using topology design in the initial design phase. By courtesy of 
Altair Engineering. 



to obtain specified material properties of the composite, making the dog 
bite its tail. 

6. APPLICATIONS 

Computational means for topology design have now been developed 
into reliable tools for considering such problems involving stiffness (in- 
cluding multiple load problems) and vibration criteria (as a reinforce- 
ment problem, see, e.g., [12]). The efficiency of topology design as a tool 
in the initial phase of a design process and the appearance of commercial 
software has lead to a broad acceptance of the method in industry, , see, 
e.g., [5, 32, 40] (see also Section 8). Research has also lead to consider- 
ing extended classes of problems, encompassing more involved settings 
for the physical model, for the optimization formulation, as well as for 
the design parametrization. Thus non-linear behaviour (large displace- 
ments, non-linear material behaviour), local constraints (i.e., constraints 
imposed on all points in the domain) as well as multi- material problems 
are covered in current research. The areas of applications are also being 
extended, research moving into such fields as design of materials, see 
Figure 5, and design of multiple physics devices (like MEMS, Micro- 
Electro-Mechanical-Systems), see Figure 6. For example literature we 
refer to [14, 19, 24, 25, 34, 37]. 

It is important in this context to note that direct links between specific 
classes of composites and proofs of existence for many of these extended 
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Figure 5 Design of a periodic composite with negative Poisson ratio. Topology op- 
timization is applied to the unit cell of periodicity, and the effective parameters of 
the composite are computed via homogenization. The right-hand picture shows a 
micro-device representing the composite. By courtesy of Ole Sigmund, Ulrik Darling 
Larsen and Siebe Boustra, see also [21, 33]. 



problems have yet to be discovered, i.e., it is not known for these prob- 
lems what composites will make relaxation intrinsic in such an approach. 
The emphasis in the field is currently on modelling and on the devel- 
opment of computational means, mostly relying on restriction methods 
as an underlying implicit guarantee of well- posedness. Thus, in the 
context of theory, the pace of implementations for new problem types 
has far overtaken the full mathematical understanding of the interplay 
between methods for obtaining ‘classical’ solutions and methods for de- 
riving relaxed problem settings, and here lies an immense challenge for 
the future. For a mathematician there is thus an abundance of (hard) 
problems to study. 

For exemplification of modelling consideration we will here briefly 
outline some aspects of treating multiple physiscs problems. The phrase 
‘multiple physics’ is here used to cover topology design where several 
physical phenomena are involved in the problem statement, thus cov- 
ering situations where for example elastic, thermal and electromagnetic 
analyses are involved. When modelling such situations, the basic concept 
of the density distribution method for topology design provides a general 
framework for computations, but here the initial obstacle is the need for 
interpolation of not only stiffness but also other physical properties. If 
only linear models are considered, one possibility is to use the theory and 
computational framework of homogenization of composite media to com- 
pute effective elastic, thermal and electromagnetic properties of a given 
type of composites and use such relationships between intermediate den- 
sity and material properties in the design problem. An example of this 
approach for thermo-elastic problems can be found in [30]. However, 
the less complex design description of the SIMP approach has lead also 
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Figure 6 Design of a thermo-electro-mechanical actuator. The mid right-hand point 
should move right or up when voltage is applied at the corresponding poles. The 
movement is through elastic deformation due to the temperature change which occurs 
when a current passes through the structure. By courtesy of O. Sigmund, [35] 



to the development of such interpolation schemes for multiple physics 
problems. As an example of this, in reference [37], microstructures with 
extreme thermal expansion are designed by combining a material inter- 
polation of elastic properties with a similar interpolation of the thermal 
expansion coefficients. 

7. DIFFICULTIES - FUTURE WORK 

The models described here all refer to an approach to topology de- 
sign where material is distributed in a fixed domain. A pivotal aspect 
of this idea in computational implementations is the use of a fixed FEM 
mesh for the domain. This is not an inherent requirement, but is use- 
ful for computational efficiency. If a topology of material and void is 
the goal of the design process, this will imply that low density areas 
are also included in the analysis for each feasible design. For stress 
constraints this leads to the difficulty of the so-called ‘stress singularity 
phenomenon’, where low density regions may have high stress but are 
structurally insignificant for the final design, and where regularization 
(in the sense of mathematical programming) is needed for numerically 
solving such problems ([14]). A similarly elusive problem arising from 
this basic design representation appears in situations involving stability 
and vibration criteria. The relevant data to consider in such situations 
are the eigenvalues of the structurally relevant parts of the structure, 
i.e., the buckling loads and the vibration frequencies of the ‘black’ part 
of a black- and-white design. In a true black-and-white design this are 
the non-zero eigenvalues, but at intermediate steps of an iterative opti- 
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mization method implemented with interpolation schemes it can become 
unclear what are the relevant values to consider. Examples of this are 
localized modes which appear in low density regions and which should 
be filtered out in order that the optimization deals with the structurally 
interesting modes. Moreover, one should cater for certain aspects which 
complicates modelling, for example that no structure what-so-ever has 
the highest eigenfrequency of all structures with only structural mass. 
See references [24, 26] for further discussion. 

For future work it would be beneficial to reconsider the way geom- 
etry representation is currently handled. This should include studying 
alternatives to the current close linking of geometry and physics where 
the same finite element grid is used to approximate the geometry and 
the mechanical response fields. Here grid-less methods may prove use- 
ful. Typically, in fixed domain material distribution methods, the grid 
is a uniform, rectangular partition of space and the design variables are 
assumed to be constant within each element. Thus, a raster model is 
imposed on the geometry. This approach has a number of very benefi- 
cial features in terms of computational efficiency, but it also has some 
intrinsic less than desirable features (non-smooth boundary representa- 
tion etc.) which are typically not too severe if they are recognized and 
addressed in the implementation as well as in the interpretation of the 
computer generated designs. However, as outlined above for the stress 
constrained and vibration constrained problems, the fixed grid represen- 
tation for geometry and analysis implies that it can be rather tricky to 
handle the correct formulation of objectives and constraints as well as 
interpolating physical data for intermediate densities, a feature which 
gets exaggerated when considering non- linear material behaviour. The 
latter could be circumvented by further success in handling the 0-1 prob- 
lem directly, an area of utmost importance. For the problems arising due 
to the fictitious domain type approach to analysing on a fixed domain 
a new approach in the area of topology design may be the only option. 
In view of this, it seems that future research could also benefit from 
investigations into alternative approaches to topology design for black- 
and-white design, with an emphasis on the geometric modelling aspect. 
Here the concept of the bubble method ([15]) should no doubt receive 
further attention, especially in the light of recent work on the concept 
of a general topological derivative, see [38]. Work in this direction could 
also be helpful in an effort to rejuvenate the field of optimal boundary 
shape design, an important area, unfortunately currently lingering an 
idle life. 
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8. USING THE WEB 

Much useful information on topology design can be found on the In- 
ternet. A free topology optimization test-site (make your own design!) 
can be found at http://www.topopt.dtu.dk/. Here one can also find 
tutorials on the subject, a free 99 line MATLAB topology optimization 
code, as well as interesting links to academic and commercial sites. 

Acknowledgments 

The author would like to thank Ole Sigmund for permission to use several illustra- 
tions shown in this presentation. Kind assistance by Jens Gravesen when preparing 
this paper is also gratefully acknowledged. 

References 

[1] G. Allaire, E. Bonnetier, G. Francfort and F. Jouve (1997), Shape 
optimization by the homogenization method, Numerische Mathe- 
matik^ 76, pp. 27-68. 

[2] G. Allaire and R. Kohn (1993a), Optimal bounds on the effec- 
tive behaviour of a mixture of two well-ordered elastic materials, 
Quaterly of Appled Mathematics^ 51, pp. 643-674. 

[3] G. Allaire and R. Kohn (1993b), Optimal design for minimum 
weight and compliance in plane stress using extremal microstruc- 
tures, European Journal of Mechanics A, 12, pp. 839-878. 

[4] L. Ambrosio and G. Buttazzo (1993), An optimal design problem 
with perimeter penalization. Calculus and Variation^ 1, pp. 55-69. 

[5] A. Back- Pedersen (1999), Taking advantage of using both topology 
and shape optimization for practical design, in Proc. NAFEMS 
World Congress ’99, Newport, Rhode Island, NAFEMS ltd, pp. 889- 
899. 

[6] M. Beckers (1999), Topology optimization using a dual method 
with discrete variables, European Journal of Mechanics A, 17, pp. 
14-24. 

[7] M. Bendspe (1995), Optimization of Structural Topology, Shape, 
and Material, Springer Verlag, Heidelberg. 

[8] M. Bendspe and O. Sigmund (1999), Material interpolation schemes 
in topology optimization. Archives of Applied Mechanics, 69, pp. 
635-654. 

[9] M. Bendspe and N. Kikuchi (1988), Generating optimal topologies 
in structural design using a homogenization method. Computer 
Methods in Applied Mechanics and Engineering, 71, pp. 197 224. 




Topology Design - Status and Perspectives 15 



[10] B. Bourdin (1999), Filters in topology optimization, Danish Center 
for Applied Mathematics and Mechanics, DCAMM Report No. 628. 

[11] G. Cheng and N. Olhoff (1981)], An investigation concerning op~ 
timal design of solid elastic plates. International Journal of Solids 
and Structures^ 17, pp. 305-323. 

[12] A. Diaz and N. Kikuchi (1992), Solutions to shape and topology 
eigenvalue optimization problems using a homogenization method. 
International Journal for Numerical Methods in Engineering^ 35, 
pp. 1487-1502. 

[13] A. Diaz and R. Lipton (1997), Optimal material layout for 3D 
elastic structures. Structural Optimization^ 13, pp. 60-64. 

[14] R Duysinx and M. Bendspe (1998), Topology optimization of con- 
tinuum structures with local stress constraints. International Jour- 
nal for Numerical Methods in Engineering^ 43, pp. 1453-1478. 

[15] H. Eschenauer, V. Kobelev and A. Schumacher (1994), Bubble 
method of topology and shape optimization of structures. Structural 
Optimization^ 8, pp. 42-51. 

[16] C. Fleury (1989), CONLIN: an efficient dual optimizer based on 
convex approximation concepts. Structural Optimization^ 1, pp. 81- 
89. 

[17] J. Goodman, R. Kohn and L. Reyna (1986), Numerical study of a 
relaxed variational problem from optimal design, Computer Meth- 
ods in Applied Mechanics and Engineering^ 57, pp. 107-127. 

[18] M. Kamat (1995), Structural Optimization - Status and Promise^ 
AIAA, Washington D.C. 

[19] N. Kikuchi, S. Nishiwaki, J. Fonseca and E. Silva (1998), Design 
optimization method for compliant mechanisms and material mi- 
crostructure, Computer Methods in Applied Mechanics and Engi- 
neering^ 151, pp. 401-417. 

[20] R. Kohn and G. Strang (1986), Optimal design and relaxation of 
variational problems. Communications in Pure and Applied Math- 
ematics^ 39, pp. 113-137, pp. 139-182, pp. 353-377. 

[21] U. Larsen, O. Sigmund and S. Bouwstra (1997), Design and fabrica- 
tion of compliant mechanisms and material structures with negative 
poissons ratio. Journal of Micro Electro Mechanical Systems, 6, pp. 
99-106. 

[22] J.-L. Lions (1981), Some Methods in the Mathematical Analysis of 
Systems and Their Control, Gordon and Breach Science Publishers, 
Inc., New York. 




16 Martin P. Bends0e 



[23] K. Lurie, A. Cherkaev and A. Fedorov (1982), Regularization of 
optimal design problems for bars and plates, I, II, III, Journal 
of Optimization Theory and Applications^ 37, pp. 499-543, 42, pp. 
247-282. 

[24] M. Neves, H. Rodrigues and J. Guedes (1995), Generalized topol- 
ogy design of structures with a buckling load criterion. Structural 
Optimization^ 10, pp. 71-78. 

[25] J. Ou and N. Kikuchi (1996), Optimal design of controlled struc- 
tures, Structural Optimization^ 11, pp. 19-28. 

[26] N.L. Pedersen (1999), Maximization of eigenvalues using topology 
optimization, Danish Genter for Applied Mathematics and Mechan- 
ics, DCAMM Report No. 620. 

[27] J. Petersson (1999a), A finite element analysis of optimal vari- 
able thickness sheets, SIAM Journal of Numerical Analysis^ 36, pp. 
1759-1778. 

[28] J. Petersson (1999b), Some convergence results in perimeter- 
controlled topology optimization. Computer Methods in Applied 
Mechanics and Engineering^ 171, pp. 123-140. 

[29] J. Petersson and 0. Sigmund (1998), Slope constrained topology 
optimization. International Journal for Numerical Methods in En- 
gineering^ 41, pp. 1417-1434. 

[30] H. Rodrigues and P. Fernandes (1995), A material based model for 
topology optimization of thermoelastic structures. International 
Journal for Numerical Methods in Engineering^ 38, pp. 1951-1965. 

[31] G. Rozvany (1997), Topology Optimization in Structural Mechanics, 
Springer Verlag, Heidelberg. 

[32] U. Schramm (1999), The use of structural optimization in automo- 
tive design - state of the art and vision, in Bloembaum, C., editor, 
Proc, Third World Congress of Structural and Multidisciplinary Op- 
timization, University of New York at Buffalo, pp. 200-202. 

[33] O. Sigmund (1995), Tailoring materials with prescribed elastic prop- 
erties, Mechanics of Materials, 20, pp. 351-368. 

[34] O. Sigmund (1997), On the design of compliant mechanisms using 
topology optimization. Mechanics of Structures and Machines, 25, 
pp. 495-526. 

[35] O. Sigmund (1998), Topology optimization in multiphysics prob- 
lems, in Proc. 7th AIAA/USAF/NASA/ISSMO Symposium on 
Multidisciplinary Analysis and Optimization St. Louis MI Sept. 2-4 
’98, AIAA, pp. 1492-1500. 




Topology Design - Status and Perspectives 17 



[36] O. Sigmund and J. Petersson (1998), Numerical instabilities in 
topology optimization, Structural Optimization, 16, pp. 68-75. 

[37] O. Sigmund and S. Torquato (1997), Design of materials with ex- 
treme thermal expansion using a three-phase topology optimization 
method. Journal of the Mechanics and Physics of Solids, 45, pp. 
1037-1067. 

[38] J. Sokolowski and A. Zochowski (1999), On the topological deriva- 
tive in shape optimization, SIAM Journal on Control and Opti- 
mization, 37, pp. 1251-1272. 

[39] K. Svanberg (1987), The method of moving asymptotes - a new 
method for structural optimization. International Journal for Nu- 
merical Methods in Engineering, 24, pp. 359-373. 

[40] R. Yang and A. Chahande (1995), Automotive applications of top>ol- 
ogy optimization. Structural Optimization, 9, pp. 245-249. 

[41] J. Zowe, M. Kocvara and M. Bendspe (1997), Free material opti- 
mization via mathematical programming. Mathematical Program- 
ming B, 79, pp. 445-466. 




MIP: THEORY AND PRACTICE - 
CLOSING THE GAP 



Robert E. Bixby 

ILOG CP LEX Division 

889 Alder Avenue 

Incline Village, NV 89451, USA 

and 

Department of Computational and Applied Mathematics 
Rice University 

Houston, TX 77005-1892, USA 
bi bxy@caa m . rice.ed u 



Mary Fenelon 

ILOC CP LEX Division 

889 Alder Avenue 

Incline Village, NV 89451, USA 



Zonghao Gu 

As above 



Ed Rothberg 

As above 



Roland Wunderling 

As above 



1 . INTRODUCTION 

For many years the principal solution technique used in the practice 
of mixed-integer programming has remained largely unchanged: Linear 
programming based branch-and-bound, introduced by Land and Doig 
(1960). This, in spite of the fact that there has been significant progress 
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in the theory of integer programming and in the closely related field 
of combinatorial optimization. Many of the ideas developed there have 
received extensive computational testing, but, until recently, relatively 
little of that work has made it into the commercial codes used by prac- 
titioners. That situation has now changed. Several such codes, among 
them LINGO\ OSL^, and XPRESS-MP^ as well as the CPLEX^ code 
studied in this paper, now include cutting-plane capabilities as well as 
other ideas from the backlog of accumulated theory. As suggested by 
the title of this paper, the gap between theory and practice is indeed 
closing. 

In order to fix ideas, we begin with a formal definition. A mixed- 
integer program (MIP) is an optimization problem of the form 

minimize c^x 
subject to Ax = b 

I < X < u 

some or all xj integral, 

where A is an m x n matrix, called the constraint matrix^ x is a vector of 
variables^ c is the objective function^ and I and u are vectors of bounds. 
Thus, a MIP is a linear program (LP) plus an integrality restriction on 
some or all of the variables. This last restriction is what makes MIPs 
difficult (NP-hard, in the technical sense); it takes a well understood, 
convex problem and makes it non-convex. It also makes the mixed- 
integer modeling paradigm a powerful tool in representing real-world 
business applications. 

The power of the mixed-integer modeling paradigm was recognized 
almost immediately, dating back to the 50s and 60s, and numerous 
attempts were made to apply it. Unfortunately, while the modeling 
paradigm was strong, the available software and computers for solving 
the models were not. The result was disillusionment, some of which per- 
sists to this day. Many potential practitioners still believe that mixed- 
integer programming is nice to talk about, but has limited practical 
applicability. An important message of this paper is that this situation 
has changed, and changed dramatically just in the last year. It is now 
possible to solve many difficult, interesting, and practical mixed-integer 
models using off-the-shelf software. 

The following is an outline of the contents of the paper. We begin 
with a discussion of advances in methods for solving linear programming 



^ LINGO is a trademark of Lindo Systems, Inc. 
^OSL is a trademark of IBM Corporation 
^XPRESS-MP is a trademark Dash Associates Ltd. 
"^CPLEX is a trademark of ILOG, Inc. 
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problems. First we give a snapshot overview of developments in the pe- 
riod from the mid-80s to 1998, and then we look at 1999. One reason to 
begin with linear programming, rather than directly with mixed-integer 
programming, is that linear programming is an enabling technology for 
solving MIPs. Given this first reason, the real motivation for including 
a discussion of linear programming here is that 1999 has seen some re- 
markable and unexpected improvements in the classical simplex method. 

The discussion of linear programming will be followed by the mixed- 
integer programming part of the paper. The presentation emphasizes 
features. Specific topics to be discussed will include node presolve, 
heuristics for finding feasible solutions, and cutting planes. These v/ill 
be followed by extensive computational results. 

The discussion of mixed-integer programming features can be viewed 
as having two main parts. The first discusses features that attempt 
to decrease the “upper bound” (e.g., heuristics to find better integral 
solutions). The second discusses features that attempt to increase the 
lower bound (e.g., cutting planes). When the upper and lower bounds 
become equal, the computation is finished. 

An important guiding principle of our mixed-integer algorithmic de- 
velopments is that solving MIPs often requires a “barrage” of different, 
but cooperating ideas. In other words, we try to take advantage of struc- 
tures that are common to many real-world MIPs, hoping that some or 
all will contribute to a better solution for a particular model. To do so, 
it is essential to develop good defaults, and implement the individual 
ideas in such a way that they help when they can, and otherwise hurt as 
little as possible. This approach is perhaps different from that of most 
theoretical investigations, where the goal is typically to demonstrate the 
efficacy of a particular new idea, usually in isolation. 

Finally, we consider several examples. Two of these examples will 
provide a counterbalance to the idea that good defaults are sufficient 
to handle all models. While we would like to run mixed-integer pro- 
gramming codes much as we run linear-programming codes, as black 
boxes, there will always be instances that demand some sort of tuning 
or reformulation. 

We close this section with one general remark. For many of the com- 
putational results presented in this paper, we will use geometric means 
as a method to summarize results. On occasion, when doing so, we will 
simply use the word “mean.” This usage will always refer to the geo- 
metric mean, and not the more common arithmetic mean. Arithmetic 
means can be quite misleading when applied to a set of ratios, as would 
often be the case in this paper. 
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2. LINEAR PROGRAMMING 

2.1. PROGRESS SOLVING LPS: MID-80S TO 
1998 

No attempt is made here to discuss linear-programming improvements 
in this period in detail. We will present just one table, followed by a 
brief discussion. A detailed discussion is a topic unto itself. 

As the following table illustrates, over the past ten years there has 
been steady progress in our ability to solve linear programming prob- 
lems^. 



Model: PDS-30, Patient Distribution System 

49944 rows, 177628 columns, 393657 nonzeros 



Version 


Time (seconds) 


CPLEX 1.0 (1988) 


57840 


CPLEX 3.0 (1994) 


4555 


CPLEX 5.0 (1996) 


3835 



The model PDS-30 is one of a class of models introduced in Carolan, 
et ah, (1990), and is well-known within the linear-programming commu- 
nity. For reasons that are hopefully apparent given the CPLEX 1.0 data 
in the table, the larger instances in this class (e.g., PDS-30) were con- 
sidered very difficult when first introduced. The runtimes in the table 
were produced using a modern workstation, a 296 MHz Sun UltraSparc. 
Considering the improvement in machine speeds between 1990 and the 
present, probably exceeding a multiple of 1,000, describing this model 
as being very difficult in 1990 is an understatement. 

Some remarks are in order before we move on to the developments 
of the last year. First, while it is not illustrated in this table, the first 
release of CPLEX was already a significant improvement over at least 
one of the standard portable codes available at that time, XMP devel- 
oped by Marsten (1981). Thus, the fifteen-fold improvement for the one 
problem in the table in the period 1988 to 1998 can be viewed as an un- 
derestimate. Second, and much more important in our view, the most 
significant development of the last decade is not really reflected in this 



^In CPLEX 1.0 only a primal simplex algorithm was available. In subsequent versions, 
primal and dual simplex algorithms, and a barrier algorithm were available. We used the 
dual simplex algorithm when solving PDS-30 with CPLEX 3.0 and 5.0. 
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table. This development was the leap forward in robustness of linear 
programming codes. They have not only become more robust in terms 
of solve times, but also much more robust at handling numerical difficul- 
ties and problems related to degeneracy. In short, linear programming 
has become more-and-more a tool that practitioners can simply use, em- 
bedding it as a black box in other applications without having to worry 
whether it will do its job. 

2.2. PROGRESS SOLVING LPS: 1999 

We began the work described in this section by making a simple ob- 
servation: LPs have become larger. This is the same sort of observation 
that was made ten years ago at the start of the developments highlighted 
in the previous section. Here it led us to focus specifically on models 
with at least 10,000 constraints. It also led us to focus on the simplex 
method, since it was the simplex method that seemed to be underper- 
forming on these large models. It didn’t take long to discover where the 
bottleneck lay: The solution of the two (sometimes three) linear systems 
that are necessary at each simplex iteration. These linear systems are 
commonly called B TRAN a.nd FTRAN (see Chvatal (1983)). 

It is not strictly necessary to know what the FTRAN and BTRAN 
systems refer to here. The basic idea is quite simple. Imagine we are 
to solve a large linear system Lx — a, where T is a triangular matrix, 
a is extremely sparse, and x turns out to be very sparse as well. Both 
vectors often have fewer than 100 nonzeros among them, in spite of 
the fact that L is of order 10,000 or more (corresponding to a linear 
programming problem with 10,000 or more constraints). Clearly, when 
a and x contain this few nonzeros, it is unlikely that the cause v^as 
cancellation during the solve; more likely is that the number of nonzeros 
touched in L, in order to compute x, was very small as well. Thus, 
the key to reducing the cost of the solve is to do it in an amount of 
time linear in this number of nonzeros. As it turns out, though this fact 
was apparently not being exploited in linear programming codes, the 
existence of such an algorithm has long been known in the sparse linear 
algebra community. It is equivalent to a certain, natural reachability 
problem in a graph. See Gilbert and Peierls (1988). 

When the above bottleneck was removed, it then made possible fur- 
ther improvements to the simplex method itself. This is where the real 
progress occurred. Two examples: 



The dual simplex algorithm: It can be shown that variables 
with two finite bounds often do not need to be binding in the ratio 
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test. Exploiting that fact leads to what one might call a “long- 
step” dual simplex algorithm. 

■ Fast pricing update: If the solutions of the key linear systems 
at each iteration are sparse, then it is reasonable to expect that 
only a small number of reduced costs will change, and hence that 
appropriate update schemes can be introduced to accelerate the 
choices of entering and leaving variables in the primal and dual 
simplex algorithms, respectively. 

For PDS-30, the resulting improvement is significant indeed: 



Model; PDS-30, Patient Distribution System 

49944 rows, 177628 columns, 393657 nonzeros 



Version 


Time (seconds) 


CPLEX 1.0 (1988) 


57840 


CPLEX 3.0 (1994) 


4555 


CPLEX 5.0 (1996) 


3835 


CPLEX 6.5 (1999) 


165 



Of course, PDS-30 is just one problem, used here as an illustra- 
tion. Much more extensive tests were done to evaluate the effects of 
the changes introduced with CPLEX 6.5. In addition to the changes 
outlined above for the simplex method, there were also important, if not 
quite as dramatic, improvements in the barrier implementations. These 
barrier improvements can be summarized as due to two things: (a) Bet- 
ter ordering algorithms for the computation of the Cholesky factoriza- 
tion, see Rothberg and Hendrickson (1998), and (b) better exploitation 
of the available level-two cache in modern computing architectures, see 
Rothberg and Gupta (1991). 

In what follows, an extensive set of test results are given to evaluate 
the performance improvements in CPLEX 6.5. The results are broken 
into two parts: Small models and large models. Before plunging into the 
details, it is perhaps worthwhile to point out the philosophy of the way 
the improvements were implemented. The overall target was “robustness 
and scalability” in the algorithms. At least as important as making 
the algorithms better on larger models was that performance did not 
degrade, and hopefully improved, on the broad middle-range of models 
that dominate in practice. Indeed, while the improvements on large 
models were exciting and were the impetus behind this work, the real 
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effort was expended in making sure that these improvements didn’t get 
in the way when they didn’t help. The same theme was mentioned earlier 
for mixed-integer programming. 

2.3. PERFORMANCE ON SMALL LPS 

Using a 400 MHz Pentium II running a Linux operating system, 
CPLEX 5.0 was run on all models in the CPLEX library of linear pro- 
gramming problems, a library that has been collected over a period now 
exceeding ten years. For each of the primal and dual simplex algorithms, 
we collected all the models that had solve times of less than 100 seconds 
using CPLEX 5.0. Performance on these models was then compared 
to CPLEX 6.5. The following table summarizes the results, where, as 
discussed below, a ratio bigger than 1.0 means that 6.5 was faster than 
5.0: 



Performance Improvements: Small LPs 

CPLEX 5.0 to 6.5 



Segment 


Primal Ratio 


Number 


Dual Ratio 


Number 


0 to 1 secs 


1.02 


145 


1.02 


139 


1 to 10 secs 


1.36 


99 


1.42 


101 


10 to 100 secs 


1.59 


87 


1.88 


102 



Thus, for example, there were 99 models where the solve time with 5.0 
using primal simplex was at least 1 second and no more than 10 seconds 
(and 101 such models for dual simplex). For primal simplex, taking the 
solve time for each model with 5.0 and dividing it by the solve time for 
6.5 resulted in 99 ratios of solve times. Computing the geometric mean 
of these ratios gave a value of 1.36. Similarly, for the dual the computed 
mean was 1.42. Thus, based upon this last number, one might say 
that for models in the 1 to 10 seconds segment, the 6.5 dual was 42% 
faster than the 5.0 dual. These results were a pleasant surprise. It was 
only for larger models that we were certain there would be substantial 
improvements. 

Below, for completeness, we also list some size statistics for the above 
groups, using both the geometric mean and the median. These are 
problem sizes after presolve was applied, where presolve refers to a set 
of problem reduction routines applied prior to calling the optimization 
routines. Some of the original, unpresolve model sizes are quite substan- 
tial and would be misleading in the current context. See Brearley, et al. 
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(1975), and Anderson and Anderson (1995) for a discussion of presolve 
reductions. 



Problem Sizes 



Primal Simplex 



Means Medians 





Rows 


Cols 


Nonzeros 


Rows 


Cols 


Nonzeros 


0 to 1 secs 


307 


556 


3379 


337 


491 


2667 


1 to 10 secs 


1396 


3316 


17046 


1357 


2814 


15114 


10 to 100 secs 


2774 


9060 


53962 


3016 


6912 


43298 



Dual Simplex 







Means 






Medians 


Rows 


Cols Nonzeros 


Rows 


Cols 


Nonzeros 


0 to 1 secs 


287 


488 


2919 


323 


515 


2961 


1 to 10 secs 


1424 


2973 


15927 


1377 


2814 


14295 


10 to 100 secs 


2824 


7597 


45860 


3266 


6831 


41402 



As one final statistic, we give the geometric means of the solve times 
using CPLEX 6.5 for each of the above groups, using primal and dual: 



CPLEX 6.5: Mean Solution Times (seconds) 



Segment 


Primal Simplex 


Dual Simplex 


0 to 1 secs 


0.2 


0.1 


1 to 10 secs 


2.4 


2.3 


10 to 100 secs 


19.5 


17.4 



2.4. PERFORMANCE ON LARGE LPS 

We also went through the entire CPLEX library of LPs, previously 
mentioned, and collected all instances which, after application of CPLEX 
5.0 presolve, had at least 10,000 rows. From these models an attempt 
was made to remove those that appeared to be just minor variations 
on other models in the collection. In the same vein, there were several 
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instances, such as the PDS models, where a whole family of models with 
increasing sizes were found. In these cases, the largest instance from the 
family was included in the test-set and the others deleted. 

All runs were done on PCs with 400 MHz Pentium II processors run- 
ning a Linux operating system. Models were included in the final per- 
formance numbers only if, in presolved form, they were solvable within 
one-half Gigabyte of physical memory. This limitation was dictated by 
memory availability on our test machines. A time limit of 500,000 sec- 
onds (about 6 days) was also imposed for each run. A limit this large 
may seem excessive, but it was deemed necessary for the tests since the 
expectation was that several models would solve in several thousands 
of seconds with CPLEX 6.5 and would be a large multiple slower v/ith 
CPLEX 5.0. Comparisons would not have been possible otherwise. Note 
that in the final analysis of the data, where ratios are used to compare 
the various algorithms, models that exceeded the memory limit v/ere 
not included. However, those that reached the time limit were included, 
in such cases using 500,000 seconds as the run time. As a result the 
comparisons we made underestimated the actual improvements. 

Table 1 in the appendix gives size statistics for the models generated, 
ordered by the number of constraints in the model. Generic names (LPOl 
through LP90, ordered by increasing numbers of constraints) have been 
used since many of these models are proprietary. The mean number of 
constraints was about 50,000, with three models having over 1,000,000 
constraints. The following two tables summarize comparative perfor- 
mance as problem size grows. There are distinct tables for barrier and 
simplex (plus best) since the sets of models not meeting the memory 
restriction were different. 

Performance Improvements: Large LPs 

CPLEX 5.0 to 6.5 

Mean Ratios 



Problems Primal Simplex Dual Simplex Best 



Biggest 10 


8.5 


22.3 


18.0 


Biggest 20 


7.9 


18.8 


12.2 


Biggest 30 


7.4 


20.2 


11.3 


Biggest 40 


6.4 


14.0 


8.0 


Biggest 50 


5.5 


11.9 


6.7 


Biggest 60 


5.2 


10.1 


6.2 


Biggest 70 


5.1 


9.1 


5.6 


Biggest 80 


4.5 


8.2 


5.2 


All 


4.4 


8.0 


5.2 
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Performance Improvements: Large LPs - Barrier 
CPLEX 5.0 to 6.5 



Problem 


Mean Ratios 


Biggest 10 


11.9 


Biggest 20 


5.5 


Biggest 30 


4.1 


Biggest 40 


3.7 


Biggest 50 


3.6 


Biggest 60 


3.7 


Biggest 70 


3.6 


All 


3.6 



The first table above refers to simplex results and the best of primal 
and dual simplex and barrier, where barrier includes crossover to a basic 
solution. The total number of models in the All category for simplex 
was 86; four models failed the memory-limit test. 75 models are in the 
All category for barrier; fifteen failed the memory-limit test. 

For each model passing the memory test and for each algorithm, two 
runtimes were produced, one for CPLEX 5.0 and one for CPLEX 6.5. 
Given these sets of numbers, ratios were computed of the 5.0 time divided 
by the corresponding 6.5 time. Thus, a ratio bigger than 1.0 meant that 
6.5 was faster. 

To understand how the summary numbers in the tables were con- 
structed, consider the row labeled Biggest 30 in the first table, for the 
primal and dual simplex algorithms and “best.” To get the numbers 
in this row, we computed the geometric means of the time ratios for 
models LP58 to LP90 (excluding LP85, LP89, and LP90, which failed 
the memory test) doing so for each of the three algorithms. The results 
for the primal simplex algorithm indicate a speedup of 7.4 on average, 
for CPLEX 6.5 versus CPLEX 5.0; for the dual the speedup was 20.2 on 
average; and for the best of primal, dual, and barrier, the speedup was 
11.3. 

In summary, the overall improvements are very large indeed. The 
magnitude of these improvements was unexpected. 

One thing the first of the above tables does indicate quite clearly is 
that the dual simplex algorithm experienced a larger improvement than 
the other algorithms. This observation leads to the question of how the 
various algorithms compare to each other. Which is best? Here is a 
summary: 
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CPLEX 6.5: Algorithm Comparison 



Algorithms 


Instances 


Mean 


Wins for Dual 


Primal/Dual 


86 


2.6 


56 


Barrier /Dual 


78 


1.2 


41 



Thus, dividing the primal solve time for each model by the dual solve 
time and computing the geometric means of the resulting ratios gives 
the result that the dual was a factor of 2.6 faster overall. 86 models 
were included in the test. In 56 of the instances dual was the winner. 
In 30 instances primal won. For the barrier versus dual comparison, it 
was much closer, with dual winning by only a small margin, but winning 
nevertheless, with a mean ratio of 1.2. Dual was the faster algorithm in 
41 instances, while barrier won in 37 instances. 

Missing from the table, because of the focus on the dual, is the com- 
parison between barrier and primal. In that comparison barrier won 46 
times and primal 32 times, and the mean ratio was 1.8, with barrier 
the winner. Finally, doing a comparison among all algorithms, using all 
90 models (see Remark 3, below), we obtain the interesting result that 
primal won 18 times, dual 33 times, and barrier 39 times. 

Remarks: 

1 Among the 86 instances in which CPLEX 6.5 primal and dual were 
compared, primal and dual reached the 500,000 second time limit 
on one common model. This model contributed a 1.0 to the mean 
ratio. There were six additional instances in which the primal 
reached the time limit, and no additional instances for the dual. 

2 There were four models too large to be solved with any of the 
algorithms within the one-half Gigabyte limit: LP13 (because of 
the density of the LU and Cholesky factors), LP85, LP89, and 
LP90. In all four cases, limited tests were run on machines with 
more available physical memory. In each of these cases, barrier 
was clearly the superior algorithm. One of the models, LP89, has 
yet to be solved with a simplex algorithm. 

In addition to the four models just listed, there were eight models — 
LP26, LP63, LP71, LP78, LP79, LP80, LP8I, and LP86— that 
could not be run with CPLEX 6.5 barrier within the one-half Gi- 
gabyte memory limit, but could be run with both primal and dual. 
Partial barrier tests were run with these models on larger-memory 
machines. In each case the simplex method dominated the perfor- 
mance of the barrier algorithm. In six of the cases, dual was the 
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superior algorithm, in one primal, and in one case primal and dual 
produced similar performance. 

3 All of the numbers presented here can be viewed as biased against 
the barrier algorithm in the following two senses. First, the floating- 
point performance on the machines we used, X86 PC’s, is markedly 
inferior to that on most UNIX workstations. Floating-point perfor- 
mance is key to the performance of the barrier algorithm. If these 
tests had been run on machines with better floating-point per- 
formance, barrier likely would have “won” the comparison with 
dual. Second, the barrier algorithm can be run in parallel on 
shared-memory machines, and produces good speedups over a wide 
range of model characteristics. No such parallelism appears to be 
available for the simplex algorithms. If this difference in parallel 
performance had been exploited, even to the extent of using two 
processors, again barrier would have won. 

What can one say in general about the best way to solve large models? 
Which algorithm is best? If this question had been asked in 1998, our 
response would have been that barrier was clearly best for large models. 
If that question were asked now, our response would be that there is no 
clear, best algorithm. Each of primal, dual, and barrier is superior in a 
significant number of important instances. 

3. MIXED-INTEGER PROGRAMMING 

This is a discussion focused on features. We will consider the following 
topics: 

■ Heuristics 

■ Node Presolve 

■ Cutting Planes 

As mentioned earlier, a guiding principle of our MIP developments was 
to apply a “barrage” of different techniques to each model. By applying 
every technique to every model, we benefit if any of the techniques are ef- 
fective, and we free the users from having to determine which techniques 
are appropriate for their specific models. An unanticipated benefit from 
this approach was that the techniques often combine to produce results 
that would not have been possible with any one technique. The obvious 
downside is that we pay the cost of every technique, even when the tech- 
nique is not effective for that model. Much of the work of implementing 
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the techniques we will now discuss went into creating aggressive strate- 
gies for determining that a technique is not helping and turning it: off 
automatically. 

The standard technique for solving mixed-integer programming prob- 
lems is a version of divide-and-conquer known as linear-programming 
based branch-and-bound, or, what is now a more correct name, branch- 
and-cut This algorithm begins by solving the linear-programming re- 
laxation, obtained by simply deleting the integrality restrictions. If the 
solution X* of this LP satisfies all the integrality restrictions, we are 
done; otherwise, some integrality restriction is violated. Picking an in- 
tegral variable Xj that is currently fractional with value we branchy 
creating two separate “child” problems from the single “parent” prob- 
lem, one of which has the added restriction xj < [Xj\ and the other 
of which has the added restriction Xj > \xj]. At any point, if a cut- 
ting plane is identified that cuts off the solution to the current LP, that 
constraint is added to the LP. The procedure is repeated. 

Two important quantities that are generated during the branching 
process are an objective function upper bound and an objective function 
lower bound. Upper bounds are obtained by finding feasible integral 
solutions. Lower bounds are obtained by taking the smallest optimal 
objective value for a linear-programming relaxation among all current 
active branch-and-cut nodes. In terms of these two bounds, we can 
think of node presolve and heuristics as contributing to the upper bound, 
and both node presolve and cutting planes as contributing to the lower 
bound. 



3.1. NODE PRESOLVE 

It is now standard to apply problem simplification routines to linear 
programming problems prior to solving. For integer programming, such 
“root” reductions seem to be even more important. We begin by ap- 
plying a restricted form of the reductions for linear programming, those 
that are valid for integer programs. We then apply several additional 
reductions, the main two being “bound strengthening” and “coefficient 
reduction.” See Hoffman and Padberg (1991) and Savelsbergh (1994) 
for discussions of mixed-integer “root” presolve. 

The above is a description of what we do before the branching process 
is started. What do we do within the tree? In the integer-programming 
literature there are several proposals that perform rather extensive sets 
of presolve operations at the nodes. However, our presolve needs to work 
for general-purpose models, and has to have the property that it is not 
too expensive in the event that it does not produce positive results for a 




32 R.E. Bixhy, M. Fenelon, Z. Gu, E. Rothherg and R. Wunderling 

particular model. We have thus selected a very restricted kind of node 
presolve, one that does not make any changes that affect the constraint 
matrix: We implemented a fast, incremental form of bound strengthen- 
ing. The following is an illustration of how bound-strengthening works. 

Example: Resource allocation. 

The problem is to decide how to split up a minute of available time 
among various possible jobs. Here is a constraint: 

40xi T 30x2 T 60x3 T 60x4 -1- 30x5 T 20xg 4- 60x7 T 40xg = 60 

On the right-hand side, 60 is the total number of seconds of time avail- 
able. There are eight possible choices for individual jobs. The variables, 
each of which must take on the value 0 or 1 , determine which of the jobs 
are selected. Imagine that this constraint is part of a larger formulation. 
Down in the branch-and-cut tree, it might happen that the variable X 2 
is fixed to 1 at some node (e.g., due to previous branching on that vari- 
able). The right-hand side of the above constraint may then be updated, 
reducing it by 30 units. If we then compute upper bounds on each of the 
remaining variables and round, we deduce xi = X 3 == X 4 = X 7 = xg = 0 . 
These fixings are the result of one pass of bound strengthening. A sec- 
ond pass allows us to conclude X 5 = 1 , and a third pass xg = 0 . 

As noted earlier, node presolve attacks both the lower and upper 
bounds simultaneously. By deriving tighter bounds on integer variables, 
it often increases the objective value of the associated relaxation and 
thus improves the lower bound. By excluding fractional values, it also 
increases the likelihood that the solution of the linear-programming re- 
laxation at a node is integer feasible, thus potentially improving the 
upper bound. As discussed in the next section, node presolve is also 
used as part of the node heuristics. 

Returning to the general case, it was important first to make the code 
incremental so that it could benefit, during branch-and-cut, when the 
processing of a node was followed immediately by the processing of one 
of its children. Also important were good choices for defaults. The 
choices we made were the following: 

■ The number of repeated applications was limited; instances exist 
where, unrestricted, long sequences of reductions can occur. 

■ Apply only to non-(0,±l) matrices; otherwise, no rounding occurs. 
If there is no rounding, all bounds that are deduced are implied 
by the LP. We are interested in mixed-integer reductions. 
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■ Node presolve is applied for the first 100 nodes processed, then op- 
tionally discontinued depending upon its effectiveness during those 
initial 100 nodes. Every 100 nodes thereafter, node presolve is ap- 
plied again, and optionally reactivated. 

3.2. NODE HEURISTICS 

The idea of a node heuristic is simple. Instead of waiting for branch- 
ing to force integrality, we consider isolating the MIP at a particular 
node and applying local operations within that node to determine an 
integral solution. Typically these operations make use of the x vector 
generated as the solution of the linear-programming relaxation at that 
node and then perform some sort of ‘‘dive,” fixing an increasingly large 
number of variables until either a new, best integral solution is found, a 
new incumbent^ or the fixings that are made result in infeasibility or an 
objective value worse than the current incumbent. 

What are the reasons heuristics may help? First, having a good inte- 
gral solution as early as possible helps the overall branch-and-cut pro- 
cedure. It helps in reducing the number of nodes that are processed, 
and it speeds the processing of individual nodes by providing a tight 
objective-cutoff for the dual simplex algorithm (the method of choice 
for reoptimizing at the nodes). Second, in many real-world problems, 
high quality integral solutions are of much more importance than proofs 
of optimality. 

We used the following ingredients in our implementations: 

■ List fixing with different orders: All our heuristics involve diving, 
employing a sequence of fixings. These fixings can, for example, 
be done with basic variables first or non-basics first, or in some 
combination. Each alternative gives a different sequence. 

■ Periodic linear solves: We optionally solve LPs during the dive. 
These solves are expensive, relative to other steps, and so we limit 
the number of solves to five. 

■ Reduced-cost fixing: When an LP is solved, new reduced-cost in- 
formation is generated, and that can be used to determine new 
reduced-cost fixings. 

■ Quick and dirty node presolve: Here we leverage the existence of 
the node presolve by using a restricted version to deduce implied 
fixings from the preceding fixings. 
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With these ingredients, five different heuristics were implements. Each 
of the five is applied at the root by default. The “most successful” is 
applied periodically at subsequent nodes. 

3.3. CUTTING PLANES 

This is the area in which the bulk of the theoretical work has been 
done. CPLEX 6.5 includes the implementation of six different kinds of 
cutting plane routines, each with its own defaults determining when and 
how often it is applied. 

The kinds of cuts that are applied are listed below together with a 
limited set of references. 

■ Knapsack Covers: Crowder, Johnson, and Padberg (1983); Weis- 
mantel (1997). 

■ GUB Covers: Gu, Nemhauser, and Savelsbergh (1998). 

■ Flow Covers: Padberg, Van Roy, and Wolsey (1985); Gu, Nem- 
hauser, and Savelsbergh (1999). 

■ Cliques: Johnson and Padberg (1983); Atamturk, Nemhauser, 
and Savelsbergh (1998). 

■ Implied Bounds: Hoffman and Padberg (1991) 

■ Gomory Mixed-Integer Cuts: Gomory (1960). 

Knapsack covers were the first cuts to find extensive use in general 
purpose solvers, and have been successfully used in commercial codes for 
several years. GUB Covers are a mild extension of knapsack covers that 
exploit the existence of GUB constraints xj < 1) intersecting a given 
knapsack constraint. Flow covers can be viewed as closely related to 
knapsacks. This class of constraints appears to be very special-purpose, 
but is really quite general. The separation step, that of actually finding 
a flow cover violated by a given x vector, uses the same approach as 
for knapsack covers. The lifting step, which attempts to strengthen the 
initially found cut by increasing the dimension of its intersection with 
the underlying convex hull of integral feasible solutions, is particularly 
important for this class, but also quite complex. Cliques are touched 
upon briefly in Example 2. Implied bound cuts are discussed below. 
Gomory cuts are the classic mixed-integer cuts introduced by Gomory 
in 1960, and recently reinvestigated by Balas, et al. (1996). As we shall 
see, the power of these cuts, long neglected, is significant. 
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Knapsack Extensions. Knapsack covers have been recognized 

in CPLEX since version 3.0. The lifting was improved significantly in 
version 5.0 using ideas suggested by Martin and Weismantel (1995). In 
version 6.5 the applicability of the existing routines was extended in the 
following ways: 

■ Equality constraints: Equality constraints that would be can- 
didates for knapsack separation, if they were inequalities, are re- 
placed by pairs of opposing inequalities. 

■ Continuous variables: Where possible, continuous variables are 
replaced by appropriate bounds, depending upon the sign of the 
corresponding constraint coefficient and the sense of the const raiint. 

■ Surrogate knapsacks: Given a collection of constraints of the 
form 

n 

^ Xj > b, Xj < ajyj {j = l,...,n), yj E {0, 1} (j = 1, . . . , n). 
i=i 

We replace each Xj by the expression ajyj. 

Implied Bounds. It is standard wisdom in integer programming 
that one should disaggregate variable upper bound constraints on sums 
of variables. These are constraints of the form: 

xi + . . . + Xn ^ (ui + . . . + Un)y^ y G {0, 1}. 

where Uj is a valid upper bound on > 0 (j == 1, . . . , n). This sin- 
gle constraint is equivalent, given the integrality of y, to the follov/ing 
collection of “disaggregated” constraints: 

Xj < Ujy {j = l,...,n) 

The reason the second, disaggregated formulation is preferred is that, 
while equivalent given integrality, its linear-programming relaxation is 
stronger. However, given the ability to automatically disaggregate the 
first constraint, these “implied bound” constraints can be stored in a 
pool and added to the LP only as needed. Where n is large this latter 
approach will typically produce a much smaller, but equally effective LP. 

Gomory Mixed-Integer Cuts. Gomory mixed-integer cuts were 
among the first introduced but for years have had the unfortunate lepu- 
tation that they were not effective in practice. That reputation seems to 
be based upon two phenomena. First, Gomory cuts are often “dense,” 
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adding a significant number of nonzeros to the constraint matrix. The 
linear-programming solvers of the day just couldn’t handle the resulting 
increased density. Second, in the early tests, cuts were applied in a way 
that today seems obviously bad, but was quite natural at the time. Go- 
mory’s algorithm, not simply the cuts he introduced, was being viewed 
as a potential complete solution to integer programming, just as the sim- 
plex method was a “complete” solution for linear programming. Thus, 
instead of adding groups of cuts, where a group consists of as many 
“good” violated cuts as could be found, cuts were added one at a time, 
and branching was ignored. The result was that convergence was either 
very slow, or simply did not occur. 

Times have changed. Linear-programming solvers are better and we 
know cuts should be added in groups; moreover, we don’t expect cuts to 
solve the entire problem. We now realize how strong an ally intelligent 
branching can be. With these thoughts in mind, Gomory cuts become a 
very natural choice. They are the most general cuts that we have (one 
can always find a violated Gomory mixed-integer cut), they are easy 
cuts to implement, and they have the interesting, well-known property 
that they combine two important ideas: Rounding and disjunction. In 
effect, through disjunction they capture some of the effect of branching 
without increasing the number of active nodes. 

There is a nice geometry corresponding to Gomory mixed-integer cuts, 
as well as a simple, straightforward algebraic derivation. Given the im- 
portance of these cuts, we sketch both. 

First, the geometry. Consider a simple mixed inequality x + y > 3.5, 
where x > 0 and y is integral (not necessarily nonnegative). The feasible 
region for the linear-programming relaxation has exactly one fractional 
extreme point, (0,3.5). Removing this point is easy. We round, y < 
[3.5J and y > [3.5], and intersect the feasible region with the resulting 
pair of inequalities. The result is a pair of disjoint polyhedra, in effect, 
a disjunction. This disjunction can be removed by taking the convex 
hull of the two polyhedra. Equivalently, we can add the cutting plane 
2x + y > 4 to the original defining inequality. This cut is exactly the 
associated Gomory mixed-integer cut, perhaps more properly viewed as 
a mixed-integer rounding cut in this case. See Wolsey (1998) for a further 
discussion of these issues. 

Note that it is sometimes observed that Gomory cuts are weak relative 
to some of the combinatorially-derived cuts, those that can be shown to 
be facet defining. However, at least in this case, the Gomory cut is as 
strong as it can be. It defines the integer hull. 
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Now for the algebra. Let y,Xj € Z+, and consider the equation 

2 / + ^ aijXj = d = [d\ + f, / > 0 . 

3 

Think of this equation as a row of an optimal simplex tableau. Now 
rounding, and introducing the notation aij = [ctij] + fj^ we may define 

t = y + : fj < /) + ( |~Qz j 1 • fj ^ ^ Z_(_. 

Subtracting yields 

: fj <f)+ :fj>f) = d-t. 

Now applying a disjunction, effectively branching on t, we have 
i < [dj => : fj < f)> f, and 

i > [d] ^((1 - fj)xj :/,•>/)> 1 - / 

Dividing by the right-hand side in each case, we obtain a quantity that 
is always nonnegative and, for the corresponding regions of t values, is 
at least 1. Hence, the sum is at least 1: 

E :/,</) + E ^ /;>/)> 1 

This inequality is a Gomory mixed-integer cut. For simplicity, we have 
described it for a pure integer constraint, but adding continuous variables 
is easy and really contributes nothing to understanding these inequali- 
ties. 

We remark that, for a variety of reasons, it has become standard 
in courses on integer programming to present Chvatal- Gomory integer 
rounding cuts. These cuts are closely related to the above, but are 
simpler to describe. They also have very nice, easily described theoretical 
properties. On the other hand, even for pure integer problems, it is the 
mixed-integer cuts that are computationally most useful. And, as we 
are about to see, the mixed-integer cuts really do work. 

3-4. COMPUTATIONAL RESULTS 

We present results for two test-sets of models. The first is MIPLIB 
3.0, see Bixby, et al. (1998). This is a public-domain collection of 
problems that is used by many as the standard test-set for evaluating 
mixed-integer programming codes. To obtain the models and a complete 
set of statistics, see 
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http : //www . caam . rice . edu/ ''bixby /miplib/miplib . html 

We ran the following test, comparing CPLEX 6.0, which contains 
none of the enhancements described in this section, and CPLEX 6.5. 
The tests were run on a 500 MHz DEC Alpha 21264 computer with 1 
Gigabyte of physical memory. Runs were made with a time limit of 7200 
seconds. 

The MIPLIB test-set includes 59 models. Of these 59, 22 were solved^ 
with both codes in less than ten seconds using default settings. Of the 
remaining 37, ten hit the time limit with version 6.0 but were solved 
with version 6.5. The geometric mean of the CPLEX 6.5 solution times 
for these ten models was 48.5 seconds. Removing these ten leaves 27 
models. Eight of these 27 models were solved by neither code. In these 
eight cases, we compared the gap between the incumbent and the best 
bound at termination. Version 6.0 produced a gap that was better in 
one case, by about 0.1%. Taking the ratios of the percentage gaps in all 
eight cases, dividing the 6.0 gap by the 6.5 gap, and taking the geomet- 
ric mean yielded 3.3. Thus, the mean gap for version 6.5 was 3.3 times 
better. Removing the eight models that were solved by neither code left 
19 models. These are the ones that were (a) reasonably hard, and (b) 
solvable by both codes. The geometric mean of the solution-time ratios 
in this case was 11.2. That is, CPLEX 6.5 was over 11 times faster on 
average on these models. These results are summarized in the following 
table: 

MIPLIB 3.0 - Defaults 

CPLEX 6.0 versus CPLEX 6.5 
7200 second time limit 

• 22 models solved by both codes in less than 10 seconds 

• 10 models solved by CPLEX 6.5 and not CPLEX 6.0 

• 8 models solved by neither: CPLEX 6.5 3.3 times better gap 

• 19 models solved by both: CPLEX 6.5 11.2 times faster 

There is a second test-set of largely proprietary models that we prefer 
to MIPLIB 3.0 in evaluating performance. This test-set was assembled 
about two years ago, from the CPLEX model library, in the following 
way. On some machine (what was then the fastest machine available to 
us), and using the then current version of CPLEX, we ran each model 
using defaults. Any model that solved to optimality in less than 100 



®The default CPLEX optimality tolerance for MIPs is a gap of 0.01% 
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seconds was excluded from further testing. We then made an extensive 
set of runs on the remaining models, some runs extending to several 
days, trying a variety of parameter settings. All the models that could 
be solved in this way were included in the test-set, with the exception 
of a few models (less than five) that were solvable, but took over about 
one-half day to solve. With these exceptions, one might characterize the 
resulting test-set as the models that appeared to be difficult but solvable, 
assuming tuning was allowed. Statistics for the 80 models in the tesi:-set 
are given in Table 2 in the appendix (“GIs” stands for general integer 
variables). For the present paper, we made several kinds of runs. All 
used a 500 MHz DEC Alpha 21264 system and were run with a time 
limit of 7200 seconds. 

First, we ran CPLEX 6.5 and CPLEX 6.0 with defaults. The result 
was that 6.5 did not solve three of the models to within default tolerances 
within the allotted two hours (MIP09, which is the MIPLIB 3.0 model 
arkiOOl^ MIP40, and MIP50). CPLEX 6.0 did not solve 31 models. The 
three models not solved by 6.5 were among these 31. Excluding these 
three, there was one model where the solution times were identical (and 
small). Version 6.5 was faster in 66 of the remaining cases, and 6.0 was 
faster in ten cases. Dividing the 6.0 time by the 6.5 time and taking the 
geometric mean^ gave a mean speedup of 22.3. 

We next compared CPLEX 6.5 running defaults with tuned CPLEX 
6.0 times, using the best parameter settings that are known to us. Ver- 
sion 6.5 was faster in 56 cases, and 6.0 in 22 cases. The mean speedup for 
version 6.5 using default settings compared with 6.0 using tuned settings 
was 3.8. 

Finally, we performed two kinds of tests to evaluate the effects of some 
of the mixed-integer programming features that have been discussed in 
this paper. In the first test we started with defaults, turned off indi- 
vidual features one at a time, and measured, using geometric means of 
ratios of solve times, the effects of these changes. Our second set of tests 
was performed, effectively, in the opposite direction. We turned off all 
six kinds of cutting planes, made a set of test runs, and then turned on 
the individual cuts one at at time, making comparisons using ratios and 
geometric means. The results are given below: 



^MIP45 was excluded from these ratios. It terminated prematurely with CPLEX 6.0 because 
of excessive basis singularities in the simplex method while solving some node LP. 
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Performance Impact: Relative to Defaults 



Cuts 




Other 




Implied bounds 


0% 


Node presolve 


9% 


Cliques 


0% 


Node heuristics 


9% 


CUB covers 


0% 






Flow covers 


12% 






Covers 


16% 






Gomory cuts 


35% 







Performance Impact: Individual Cuts 



Implied bounds 


-1% 


Cliques 


0% 


CUB covers 


10% 


Flow covers 


18% 


Covers 


58% 


Gomory cuts 


97% 



The big winner here, and perhaps the biggest surprise, was Gomory cuts. 
They were clearly the most effective cuts in our tests. 

4. EXAMPLES 

We close with some examples. In the previous sections we have at- 
tempted to demonstrate that great progress has been made in building 
general-purpose mixed-integer solvers, solvers that run well with default 
settings. This development is critical to the wider use of mixed-integer 
programming in practice. Most users of mixed-integer programming are 
not interested in the details of how the codes work. They simply want 
to be able to run a code and get results. Nevertheless, there still are, 
and probably always will be, many examples of interesting, important 
MIPs that are solvable, but not without taking advantage of problem 
structure in some special way. 

Example 1 The first example is from a customer who was primarily 
interested in finding feasible solutions. His criteria was, stop after finding 
a feasible integral solution with gap less than 1%. CPLEX 6.0 was 
incapable of meeting this criteria. Indeed, this model was left running 
for a period of several days on a fast workstation, a 600 MHz Alpha 
21164 computer, and not a single feasible solution was found. Below is 
a CPLEX 6.5 run for this model using a 500 MHz DEC Alpha 21264 
computer: 
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Problem ’ unnamed. mps .gz' read. 

New value for passes for generating fractional cuts: 0 
New value for mixed integer optimality gap tolerance: 0.01 
Reduced MIP has 7787 rows, 7260 columns, and 22154 nonzeros. 
Clique table members : 533 

Root relaxation solution time = 0.37 sec. 



Nodes 








Cuts/ 




Node 


Left 


Objective 


I Inf 


Best Integer 


Best 


Node 


ItCnt 


0 


0 


-2.8298e+07 


224 




-2.8298e+07 


3095 






-2.7769e+07 


160 




Cuts : 


616 


4173 






-2.7720e+07 


156 




Cuts : 


118 


4548 






-2.7703e+07 


176 




Cuts : 


54 


4790 






-2.7689e+07 


177 




Cuts : 


36 


4916 






-2.7685e+07 


180 




Cuts : 


29 


4999 






-2.7685e+07 


181 




Flowcuts 


: 6 


5035 


100 


100 


-2.7590e+07 


58 




-2.7684e+07 


6174 


200 


200 


-2.7446e+07 


12 




-2.7684e+07 


6673 


239 


236 


-2.7434e+07 


0 


-2.7434e+07 


-2.7684e+07 


6843 



GUB cover cuts applied: 6 

Cover cuts applied: 44 

Implied bound cuts applied: 66 

Flow cuts applied: 295 

Fractional cuts applied: 212 



Integer optimal solution (0 . Ol/le-06) : Objective = -2.7433577522e+07 

Current MIP best bound = -2 . 7684321743e+07 (gap = 250744, 0.91*/,) 

Solution time = 31.90 sec. Iterations = 6843 Nodes = 240 (235) 



So, this is an example of a model that now solves well with default 
settings. One interesting aspect of the solution is that it is a case in 
which several features combined to produce the result. Clearly cuts 
were involved, and, although it is not clear from the output, the node 
presolve was also important. Each of several, separate features helps, 
but it’s the combination that leads to a solution. 

Example 2 Our second example illustrates how defaults are some- 
times not enough. In CPLEX 6.5, several degrees of probing on binary 
variables are available. These options are not turned on by default. As 
is well known, even with an efficient implementation of probing, compu- 
tation times can experience a combinatorial explosion. 

Probing occurs in three phases in CPLEX 6.5 when activated at its 
‘‘highest level.” In the first phase, it is applied to individual binary 
variables, as suggested in Brearley, et al. (1975). Thus, each binary 
variable is fixed in turn to 0 and then to 1, applying bound strength- 
ening after each such fixing. For an individual variable, the result can 
include the fixing of the variable being probed (when one of the tested 
values forces the infeasibility of the whole model), implied bounds on 
continuous variables — hence, implied bound cuts become stronger when 
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probing is activated — and 2-cliques. The 2-cliques that result from this 
first phase are collected and merged together with the cliques given by 
GUB constraints, those that are explicit in the original formulation. The 
result is an initial clique table. This table is then further expanded in 
a second phase by applying lifting directly to the cuts in this table. See 
Suhl and Szymanski (1994). This idea was suggested to us by Johnson 
(1999). 

Finally, since, in general, there can be an exponential number of max- 
imal cliques, it is not possible to explicitly store all such cliques. Within 
the branch-and-cut tree we use the clique table and the current solution 
to the linear-programming relaxation, as suggested by Atamturk, et ah, 
(1998), to generate further clique cuts. 

When we first tried to solve the present example model, it appeared 
not to be possible with CPLEX 6.0. The optimal objective value of the 
root linear-programming relaxation was 1.0, and the best-bound value 
never moved above 2.0, even though several parameter settings were 
tried and several long runs were made. 

In the CPLEX 6.5 run displayed below, probing was set to its highest 
level. The result was that a large number of clique inequalities were 
generated at the root. These were crucial, pushing the lower bound at 
the root to 20.8. At the same time, one of the five heuristics that are 
applied at the root succeeded in finding a feasible solution of value 21. 
Since, as the output indicates, the objective function in the model could 
be proven to take on only integral values, it followed that 21 was optimal, 
and the run terminated without branching. 



Problem 'unnamed. Ip. gz' read. 

New value for probing strategy: 2 
Elapsed time 10.22 sec. for 47*/, of probing 

Elapsed time 20.30 sec. for 94*/, of probing 

Probing time = 21.53 sec. 

Clique table members : 1068 

Root relaxation solution time = 142.18 sec. 

Objective is integral. 





Nodes 






Cuts/ 




Node 


Left 


Objective 


I Inf 


Best Integer Best Node 


ItCnt 


0 


0 


1.0000 


4766 


1.0000 


20704 






20.8000 


439 


Cliques: 500 


36839 






20.8000 


203 


Cliques: 17 


40402 



Heuristic: feasible at 22.0000, still looking 
Heuristic complete 

♦0+0 21.0000 0 21.0000 20.8000 40402 



Gap 



0.95*/, 



Clique cuts applied: 349 

Integer optimal solution: Objective = 2 . lOOOOOOOOOe+01 

Solution time = 792.62 sec. Iterations = 40402 Nodes = 0 
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Example 3 The noswot model is one of the smaller, but more difficult 
models in the MIPLIB 3.0 test-set. It has only 128 variables, 75 of which 
are binary, and 25 of which are general integers. 

This model is very difficult to solve with the currently available branch- 
and-cut codes. With CPLEX 6.0 it appeared to be unsolvable, even after 
days of computation. It is now solvable with CPLEX 6.5, running de- 
faults, but the solution time of 22445 seconds (500 MHz DEC Alpha 
21264), and the enumerated 26,521,191 branch-and-cut nodes, are not a 
pleasant sight. 

In contrast, this model does suddenly become easy if the following 
eight constraints are added: 

cl84: X21 - X22 >= 0 

cl85: X22 - X23 >= 0 

cl86: X23 - X24 >= 0 

cl87: 2.08 Xll + 2.98 X21 + 3.47 X31 + 2.24 X41 

+ 2.08 X51 + 0.25 Wll + 0.25 W21 + 0.25 W31 + 0.25 W41 + 0.25 W51 
<= 20.25 

cl88: 2.08 X12 + 2.98 X22 + 3.47 X32 + 2.24 X42 

+ 2.08 X52 + 0.25 W12 + 0.25 W22 + 0.25 W32 + 0.25 W42 + 0.25 W52 
<= 20.25 

cl89: 2.08 X13 + 2.98 X23 + 3.4722 X33 + 2.24 X43 

+ 2.08 X53 + 0.25 W13 + 0.25 W23 + 0.25 W33 + 0.25 W43 + 0.25 W53 
<= 20.25 

cl90: 2.08 X14 + 2.98 X24 + 3.47 X34 + 2.24 X44 

+ 2.08 X54 + 0.25 W14 + 0.25 W24 + 0.25 W34 + 0.25 W44 + 0.25 W54 
<= 20.25 

cl91: 2.08 X15 + 2.98 X25 + 3.47 X35 + 2.24 X45 

+ 2.08 X55 + 0.25 W15 + 0.25 W25 + 0.25 W35 + 0.25 W45 + 0.25 W55 
<= 16.25 

Where do these constraints come from? Some time ago, one of the 
authors of this paper discovered what looked like a high degree of S}^m- 
metry among some of the variables in the model: X21, X22, X23, and 
X24. He tried the following idea. There are 24 different ways of forming 
triples of constraints from these variables, in the way indicated above by 
constraint cl84-cl86, with each of these triples removing the symmetry 
on the variables. Being uncertain that his symmetry observation was 
really valid for the entire model, he then simply solved the 24 individual 
instances, and, in so doing, the entire model. 

As some explanation for why this approach, creating 24 related in- 
stances, could be effective, consider taking several disjoint copies of the 
same model and putting them side by side in a single model. Doing so is 
not a good idea; the models, even if they are slightly different, should be 
solved individually. However, at least for pure LPs, something reason- 
able will happen, and the total solution time will grow in some way that 
is not too-highly nonlinear in the number of disjoint copies that have 
been combined. Indeed, in the case of a barrier algorithm, the total 
computation time can be expected to grow something close to linecirly 
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in the number of copies. However, doing this kind of replication with 
an integer program is an entirely different matter. There the number 
of nodes in the search, and hence the solution time, can be expected to 
grow like the product of the number of nodes in the individual search 
trees. 

Returning to the noswot instance, the above result prompted one 
of our co-workers, Irv Lustig (1999), to ‘Reverse engineer” the origi- 
nal model, and give a representation using the OPL modeling language 
(see Van Hentenryck (1999)). Another co-worker, Jean-Francois Puget 
(1999), then studied this representation and noticed that it could be 
given an interpretation as a resource allocation model on five machines, 
with scheduling, horizon constraints, and transition times. It was then 
clear that four of the five “machines” were indeed identical, and hence 
that constraints cl84-cl86 were valid. In other words, it was necessary 
to solve only one of the 24 instances mentioned above. In addition, it 
was also observed that the transition-time constraints could be strength- 
ened by adding five additional cuts that exploited the fact that there was 
actually a minimum positive transition cost of 0.25. Essentially the ar- 
gument was that if a machine performs k different jobs, then it must 
pay at least 0.25(A: — 1) in transition cost. These last constraints are also 
due to Puget. 

With these added constraints, the model becomes solvable. Here are 
the results using CPLEX 6.0 and 6.5 on a 400 MHz Pentium II Laptop 
running a Linux operating system: 

CPLEX 6.0: 142 seconds 169090 nodes 

CPLEX 6.5: 16 seconds 9807 nodes 

So, this is a case where good modeling makes the biggest difference, but 
having a stronger code is also valuable. 

5. SUMMARY 

In this paper we have discussed recent advances in linear and mixed- 
integer programming. The linear-programming improvements were most 
striking for larger models, but are effective for small and medium-sized 
models as well. One important consequence of this work is that for 
large models barrier algorithms are no longer dominant; each of primal 
and dual simplex, and barrier is now the winning choice in a significant 
number of cases. 

For mixed-integer programming, the improvements were dramatic. 
These resulted from mining an extensive backlog of theoretical ideas 
from the scientific literatures for integer programming and combinatorial 
optimization. Particular attention was given to developing good default 
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implementations of these ideas so that they could be applied in concert, 
each helping on the problems to which they applied, while causing a 
minimal degradation in performance when they didn’t apply. 
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Appendix: Problem Size Statistics 



Table 1 Large LP Statistics 



Model 


Rows 


Columns 


Nonzeros 


LPOl 


10295 


50040 


150110 


LP02 


13005 


77133 


361567 


LP03 


14738 


33025 


151383 


LP04 


15014 


37372 


103866 


LP05 


15051 


34553 


132295 


LP06 


15349 


35215 


162709 


LP07 


15455 


59942 


225514 


LP08 


15540 


23752 


86753 


LP09 


16223 


28568 


88340 


LPIO 


16768 


39474 


203112 


LPll 


17681 


165188 


690273 


LP12 


18262 


23211 


136324 


LP13 


18750 


84375 


9993717 


LP14 


19103 


33490 


276895 


LP15 


19374 


180670 


5392558 


LP16 


19519 


45832 


124280 


LP17 


19844 


55528 


152952 


LP18 


19999 


85191 


170369 


LP19 


21019 


115761 


728432 


LP20 


22513 


99785 


337746 


LP21 


22797 


63995 


172018 


LP22 


23610 


44063 


154822 


LP23 


23700 


23005 


169045 


LP24 


23712 


31680 


81245 


LP25 


24377 


46592 


2139096 


LP26 


26618 


38904 


1067713 


LP27 


27349 


97710 


288421 


LP28 


27441 


15128 


96118 


LP29 


27899 


26243 


261968 


LP30 


28240 


55200 


161640 


LP31 


28420 


164024 


505253 


LP32 


29002 


111722 


2632880 


LP33 


29017 


20074 


2001102 


LP34 


29147 


9984 


1013168 


LP35 


29724 


98124 


196524 



Model 


Rows 


Columns 


Nonzeros 


LP36 


30190 


57000 


623730 


LP37 


30258 


492266 


1162517 


LP38 


31770 


272372 


829040 


LP39 


33440 


56624 


161831 


LP40 


34994 


87510 


208179 


LP41 


35519 


43582 


557466 


LP42 


35645 


34675 


208769 


LP43 


36400 


92878 


246006 


LP44 


38782 


261079 


1508199 


LP45 


39951 


125000 


381259 


LP46 


41340 


64162 


370839 


LP47 


41344 


163569 


1928534 


LP48 


41366 


78750 


2110518 


LP49 


43387 


107164 


189864 


LP50 


43687 


164831 


722066 


LP51 


44150 


200077 


4966017 


LP52 


44211 


37199 


321663 


LP53 


47423 


81915 


228565 


LP54 


48097 


150138 


1195800 


LP55 


48548 


163200 


617683 


LP56 


54447 


326504 


1807146 


LP57 


55020 


117910 


391081 


LP58 


55463 


191233 


840986 


LP59 


60384 


100078 


485414 


LP60 


63856 


144693 


717229 


LP61 


66185 


157496 


418321 


LP62 


67745 


111891 


305125 


LP63 


69418 


612608 


1722112 


LP64 


72258 


226090 


2242086 


LP65 


84840 


316800 


1899600 


LP66 


95011 


197489 


749771 


LP67 


99578 


326504 


2102273 


LP68 


105127 


154699 


358171 


LP69 


108393 


112955 


602948 


LP70 


118158 


487427 


974854 
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Table 1 (continued) Large LP Statistics 



Model 


Rows 


Columns 


Nonzeros 


Model 


Rows 


Columns 


Nonzeros 


LP71 


123964 


93288 


459680 


LP81 


269640 


1205640 


6481640 


LP72 


125211 


159109 


457198 


LP82 


280756 


920198 


5936426 


LP73 


129181 


467192 


1025706 


LP83 


319256 


638512 


1231403 


LP74 


155265 


377918 


930166 


LP84 


344297 


559428 


1909649 


LP75 


175147 


358239 


1211488 


LP85 


363458 


146096 


11470110 


LP76 


179080 


707556 


1570514 


LP86 


589250 


1533590 


5327318 


LP77 


185929 


189867 


2787708 


LP87 


716772 


1169910 


2511088 


LP78 


186441 


23732 


397080 


LP88 


1000000 


1685236 


3370472 


LP79 


209760 


363092 


1061495 


LP89 


1204750 


1229623 


4693571 


LP80 


238969 


772273 


5795991 


LP90 


1709857 


1903725 


4959650 



Table 2 Large MIP Statistics 



Model 


Rows 


Columns 


Binaries 


GIs 


MIPOl 


230 


2025 


1800 


0 


MIP02 


759 


17561 


17561 


0 


MIP03 


4089 


121871 


121870 


1 


MIP04 


4116 


41428 


41427 


1 


MIP05 


823 


8904 


8904 


0 


MIP06 


426 


7195 


7195 


0 


MIP07 


1095 


11005 


10940 


65 


MIP08 


1838 


807 


807 


0 


MIP09 


1048 


1388 


415 


123 


MIPIO 


2597 


2288 


1166 


1122 


MIP 11 


123 


133 


39 


32 


MIP12 


105 


117 


34 


30 


MIP13 


91 


104 


30 


28 


MIP14 


8619 


5428 


1305 


2 


MIP15 


37 


526 


526 


0 


MIP16 


396 


162 


146 


8 


MIP17 


631 


783 


28 


0 


MIP18 


2176 


6000 


6000 


0 


MIP19 


113 


392 


391 


0 


MIP20 


236 


1282 


1277 


0 


MIP21 


827 


961 


152 


0 


MIP22 


2588 


435 


435 


0 


MIP23 


15 


154 


0 


153 


MIP24 


852 


1337 


19 


0 


MIP25 


80 


500 


500 


0 


MIP26 


4036 


769 


190 


0 


MIP27 


41 


49 


0 


30 


MIP28 


516 


47311 


47311 


0 


MIP29 


582 


55515 


55515 


0 


MIP30 


363 


1298 


1254 


0 
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Table 2 (continued) Large MIP Statistics 



Model 


Rows 


Columns 


Binaries 


GIs 


MIP31 


2291 


1992 


174 


12 


MIP32 


6256 


8537 


197 


0 


MIP33 


1392 


1224 


240 


168 


MIP34 


1392 


1224 


240 


168 


MIP35 


1248 


1224 


384 


336 


MIP36 


1368 


1152 


216 


168 


MIP37 


1224 


1152 


336 


336 


MIP38 


2407 


1214 


802 


0 


MIP39 


3147 


2505 


388 


1 


MIP40 


192 


845 


845 


0 


MIP41 


1799 


1008 


0 


1008 


MIP42 


43 


51 


0 


39 


MIP43 


146 


578 


444 


0 


MIP44 


2094 


5592 


443 


3212 


MIP45 


684 


1564 


235 


0 


MIP46 


68 


151 


150 


0 


MIP47 


13 


151 


150 


0 


MIP48 


12 


151 


150 


0 


MIP49 


148 


1280 


1280 


0 


MIP50 


788 


645 


140 


0 


MIP51 


212 


260 


259 


0 


MIP52 


2054 


10724 


10724 


0 


MIP53 


908 


129 


31 


0 


MIP54 


4480 


10958 


96 


0 


MIP55 


291 


422 


98 


0 


MIP56 


2280 


1090 


0 


1090 


MIP57 


36 


87482 


87482 


0 


MIP58 


176 


548 


548 


0 


MIP59 


755 


2756 


2756 


0 


MIP60 


45 


86 


55 


0 


MIP61 


246 


240 


64 


0 


MIP62 


1192 


840 


48 


0 


MIP63 


2984 


1451 


1451 


0 


MIP64 


291 


556 


300 


15 


MIP65 


249 


690 


690 


0 


MIP66 


314 


5111 


41 


0 


MIP67 


20022 


17665 


17664 


0 


MIP68 


23259 


29342 


13215 


0 


MIP69 


524 


1197 


1100 


96 


MIP70 


331 


45 


45 


0 


MIP71 


146 


578 


444 


0 


MIP72 


42 


17419 


17419 


0 


MIP73 


3228 


15541 


15540 


0 


MIP74 


1359 


1959 


0 


1959 


MIP75 


234 


378 


168 


0 


MIP76 


234 


378 


168 


0 


MIP77 


4277 


2417 


1364 


0 


MIP78 


845 


3345 


235 


0 


MIP79 


10108 


3836 


1862 


0 


MIP80 


27 


26306 


26306 


0 
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Abstract The paper discusses a new view on globalization techniques for Newton’s 
method. In particular, strategies based on “natural level functions” are 
considered and their properties are investigated. A “restrictive mono- 
tonicity test” is introduced and theoretically motivated. Numerical re- 
sults for a highly nonlinear optimal control problem from aerospace 
engineering and a parameter estimation for a chemical process are pre- 
sented. 



1. INTRODUCTION 

It is well-known that stepsize strategies based on suitable merit func- 
tions can globalize the convergence of the damped Newton method. Ex- 
perience shows, however, that the standard choices for merit functions 
may enforce very small stepsizes when the problems are only mildly 
ill-conditioned, even in the domain of full-step local convergence, thus 
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making the method very inefficient. So called “natural level functions”, 
originally introduced by Deuflhard, can avoid this effect, but up to now 
lack a rigorous convergence theory. The present paper presents a new 
view on successful globalization strategies based on these merit func- 
tions. In particular, it is shown that a stepsize criterion given by the 
authors (the so called “restrictive monotonicity test”) provides a theo- 
retical justification. Extensions to the related problems of constrained 
least squares and constrained li parameter estimation problems are sug- 
gested. Numerical results for real life applications from aerospace control 
problems and parameter estimation for chemical processes are given. 

2. NEWTON’S METHOD FOR NONLINEAR 
EQUATIONS 

We consider a finite dimensional but possibly large system of highly 
nonlinear equations 



F{x) = 0. 

Starting from an initial guess Newton’s method improves a given 
estimate x^ iteratively by applying the formula 

= x'^ + Ax^. ( 1 ) 

The increment Ax* solves the linear system of equations 

F(x*) + J(x*)Ax* = 0, (2) 

where J := 

The local convergence properties of this “full-step” version of New- 
ton’s method have been investigated thoroughly and may be formulated 
as follows. 

Theorem 1 (local convergence properties) Let F : D C BP 

be twice continuously differentiable, J{x) be nonsingular for all x E D, 
and D be a domain. Assume further that 

II J(x + tAx) — J{x))Ax\\ < u)t II Ax||^, (3) 

^ £ oo, 

for all t g]0, l],x,y = x F Ax E D with Ax = —J{x)~^F{x) ^ 0, i.e. 
a global bound uo for the ^‘curvature^^ exists, and that the initial guess x^ 
is sufficiently near to a solution: 

6^ :=|||Ax^||<l. 




( 4 ) 
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Then the following holds: 

■ if := B ||Ax^||/(l — S^)) C. D, then the sequence of iterates 
defined by (1) remains in 

■ there exists G with F(x'^) = 0 and x^ x~^ {k ^ oo), 

■ an a priori error estimate holds 



||x 



k 



— X 






||ArrO|| 
1-<J0 ’ 



■ and convergence is quadratic with 

|lAx‘+‘|| < |||Ax‘||2. 

The proof follows the lines of Banach’s Fixed Point Theorem. Contrac- 
tivity of the sequence of iterates is given by 

1 

= II j + tAx’^) - J{x^)^Ax'^dt\\ 

0 

1 

< j cot\\Ax'^\fdt = '^WAx'^lf. 

0 

Since from here 

ll^fc+P _ ||Aa:^+*|| < 

we can conclude by induction that the sequence {x^} remains in and 
is a Cauchy sequence, so x~^ exists. Finally, F{x'^) = 0 follows from con- 
tinuity and boundedness of on □ 



The local convergence theorem allows some interpretations. 

1 The constant u in (3) is a bound on the nonlinearity of the problem, 
and its inverse characterizes the size of the region in which 
the linearization (2) is an acceptable approximation of F. Hence, 
condition (4) can also be read as 

l|Ai”|| < 5, 

CO 



( 5 ) 
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for some constant 77 < 2, i.e. the increment step should not exceed 
this region. 

2 In the literature, condition (3) is typically replaced by the two 
conditions ||J(a;)"^|| < /3 < 00, \\J{y) — J{x)\\ < j\\y — x\\, 7 < 00. 
However, /?7 grossly over-estimates the weaker bound tv. 

3 In highly nonlinear problems, though, even for the weaker bound u; 
one cannot expect the initial guess to be close enough to a solution 
for condition (4) or (5) to hold. One may rather expect 

||Ai»|l»i 

uu 

in which case convergence of the “full-step” Newton method from 
x^ cannot be hoped for. 

3. GLOBALIZATION BY 
UNDERRELAXATION 

One way to globalize the convergence of Newton’s method is by damp- 
ing or underrelaxation. The iterates are then defined by 
2-fc+i ^ ^k 

where is a relaxation factor, also called the stepsize. The stepsize 
is chosen such that the next iterate is “better” than x^. It 

is determined by a line search with respect to an appropriate “merit 
function” or “level function” T{x). 

Any piecewise continuously differentiable level function which satisfies 
the compatibility condition 

Ax'^ ^0 ^ + eAx'^) 

de 

is appropriate to ensure global convergence when the Jacobians are 
bounded away from singularity. (Note, that at a minimum x^ of a com- 
patible level function Ax'*" = 0, hence F(x*) = 0.) The classical choice 
of a merit function for the underrelaxed Newton method is 

T{x) :=mx)\\l 

in any suitably scaled Euclidean norm of F{x). For an exact or approx- 
imate line search for this level function one easily shows the property: 

Lemma 1 (global convergence of damped Newton) Assume that the le- 
vel set 



<0 






No, := {x I ||F(a:)||2 < a) 
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is compact and is contained in D, that F is twice continuously differen- 
tiable and that J{x) is nonsingular on Then for all E there 
exist a stepsize sequence {ff} and x^ G such that x^ x'^ {k (X)) 
with F{x'^) — 0. 

However, it is well known that already in mildly ill-conditioned prob- 
lems such a stepsize strategy may be very inefficient since it may produce 
small stepsizes even in the domain where the full-step Newton’s method 
converges according to Theorem 1. The reason is the following. In ill- 
conditioned cases, the Newton increment 

Ax* = -J(x*)-^F(x*) 

may be nearly orthogonal to the steepest descent direction 
-Vr(x*) = -2J(x*)^F(x*), 



SO that enforcing descent of the level function leads to very small step- 
sizes. This is due to the fact that with high probability the cosine of the 
angle between two directions 

cosf = — — 

’ II J(x*)“^F(x*)|| II J(x*)^F(x*)|| ~ cond J(x*) 

will actually be near its lower bound (cond J(x*))“^. 



Example (Rosenbrock-type) 

Let us consider the system of two nonlinear equations 



F{x) = 0, F{x) = 



xi/ai 



/<72 



ai = 1 , (72 = 



50’ 



with the initial estimate x° = (50, 1)^ and the solution x* = (0, —12.5)^. 
In this example, the condition number of the Jacobian J(x) is near 50 for 
all X G which is very moderate compared to the practical nonlinear 
equations appearing typically in BVP. One can easily check that the 
conditions of Theorem 1 hold. For D = the curvature is bounded 
by u; < 0.01. Convergence for the full step method is guaranteed in 
[—100,100] X [—100,100], where ()(x) \\J{x)~^F{x)\\u/2 < 1. For 

the initial point x^ we have = —(50,1)^, and the estimate 6^ < 
50.01/200 holds. The first iteration provides x^ = (0, 0)^ and the a priori 
estimate \ \x^ — < 17 holds. The application of damped Newton with 

||F(x )||2 as level function, however, gives the stepsize ^ 0.077 as the 
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optimal relaxation factor. Indeed, the direction of the steepest descent 
of the level function T at namely —VT{x^Y' = —2F{x^)'^J{x^) = 
— 100(1, 50), is almost orthogonal to the search direction the cosine 
of the angle between them being: 

-VT(x^)'^ Ax^ 

Q^ll ~ 0.040 (which corresponds to 87.71 degrees). 

Thus, although the local contraction conditions are fullfilled quite well, 
the slight nonlinearity together with the mild ill-conditioning of the prob- 
lem leads to very small stepsizes. 

3-1. NATURAL LEVEL FUNCTIONS 

To avoid this effect, two different ways can be followed. 

Modification of the search direction. The classical way is the 

Levenberg-Marquardt or trust region variant. Here, the search direction 
is replaced by 

i.e. it is turned towards — VT(a;*^) for large 7. 

Modification of the level function. Recall that any level function 

of the type 



Ta{x) = mF(x)||2, A nonsingular, 

is compatible with Newton’s method. The special choice A J{x^)~^ 
yields a level function, called the “natural level function” by Deuflhard 
[8], [9], with some distinctive properties. 



Lemma 2 (Deuflhard [8], [9]) 

1 At the iterate x^ , the Newton direction Ax^ is the steepest descent 
direction ofTA{x) with the choice A := J{x^)~^. 

2 The level function is ^^ajfine invariant”, i.e. invariant with re- 
spect to any affine transformation F{x) BF{x), where B is a 
nonsingular matrix. 

Proof. For A = J{x^)~^ the two vectors 
-VTa - -2J{x^fA^AF{x^) and Ax^ = -J{x^)~^ F{x^) 
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are obviously collinear, and for all B 

\\J{x'^)-^F{x)\\l = \\{BJ{x>^))-^BF{x)\\l □ 

3 If the sequence {x^} converges to a solution x'*', then we have 

\\J{x^r^F{x )\\2 = ||x-x*||2 (l + 0(||o:-x*||2) + 0(\\x^-x*\\2)) • 

From the properties of Lemma 2 one may expect, that the “natural 
level function” approach should not suffer from the drawbacks demon- 
strated by the example. It may in fact be viewed as a (local) rescaling 
of F to AF^ such that the condition number of AJ{x) is optimal - 
namely 1. Figure 1 shows the steepest descent directions and contour 
lines for both the classical and natural level functions in the case of the 
Rosenbrock-type example. 




Figure 1 Rosenbrock-type example: natural level function vs classical merit function 



Step size procedures based on natural level functions were first intro- 
duced by Deuflhard [8], based on a Goldstein type strategy. In Deuflhard 
[9], refined predictor-corrector strategies for approximate line searches 
were introduced, and combinations with rank-reduction strategies emd 
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quasi-Newton modifications of the Jacobian were studied. The numer- 
ical results given by Deuflhard [8], [9], and also by other authors who 
adopted and modified the natural level function approach, showed very 
good practical results in difficult applications (e.g., Ascher et al [1], Bock 
[3], [4], [5], and Nowak and Weimann [10]). 

A major deficiency of the approach, however, is the fact that the 
change of level functions in each step prevents the classical descent argu- 
ments of global convergence proofs to hold. Hence no global convergence 
proof has been given up to now. On the contrary, similar to the Cham- 
berlain [6] result on cycling for SQP methods using the /i-penalty level 
function, examples were constructed by Ascher and Osborne [2] and by 
Plitt [12] that even showed the existence of two-cycles. 



4. THE RESTRICTIVE MONOTONICITY 
TEST (RMT) 

In the following, we will derive a more restrictive stepsize strategy 
than exact or approximate line searches on the natural level function, 
which is a slight modification of techniques already successfully used in 
practice [5]. 

We will first show that these techniques may be interpreted as step- 
size strategies, analogous to those used in numerical methods for the 
discretization of ODE with invariants, thus offering a global convergence 
argument which is not based on descent properties. In a second step, 
we will discuss modifications and extensions of damped Newton method 
that ensure global convergence as well as efficiency. 

For the natural level function in step k, we establish the following 
quadratic bound that will provide a descent property. 

Lemma 3 (Quadratic Upper Bound, where Ax^ is the Newton direc- 
tion) 



\\J{x>^r^Fix>^ + tAx^)\\ < (l - i \\J{x'^r^Fix'^)\l 

( 6 ) 

where uji{t)= sup ^ 

o<5<t 5||Aa;'=|| 

Proof. \\J{x>^r^F{x^ + tAx^)\\ - (1 - t)\\J{x>^)-^ F{x^)\\ 

< WJix^y^ + tAx^) - F{x^) + tFix'^) 

-tJ{x'^)Ax'^ + tJ(x'^)Ax'^^ II 

= \\J{x^r^ + tAx'^) - F(ar^) - tJ{x^)Ax^^ || 
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I 

= \\J{x^)-^ J (^J{x^ + sAx^) - Ax'^ ds\\ 



t 



< Ui 



it)\\Ax>^\\^ I 



sds = 



||Ax*|p 



From the previous relations it follows that 
||J(x^)-iF(a:^ + tAx^)|| < (l-0l|Aar^|| + ^a;i(<)||Ax^||2 

Zd 

= (l - « + II Ax*||) II 



□ 



Since t > 0, is monotonically nondecreasing, one may choose 

the damping factor in terms of this quadratic upper bound 

such that 



^ = max! s.t. t < 1, tuji{t)\\Ax^\\ < rj^ 

for some prescribed r] < 2. This means that we choose < 1 maximal 
such that the ’’Restricted Monotonicity Test” (RMT) 

f‘||Ax‘||<min(^,||Ax‘||), (7) 

is fulfilled. RMT (7) together with Lemma 3 ensures that the weaher 
traditional Armijo-type descent condition 

\\J{x'^r^F{x^ + tHv)^x^)\\ < (l - - |)) \\Jix^)-^F{x>^)\\ (8) 

holds. Note, that for ry = 1 would minimize the right hand side 
of QUB (6) with respect to t if uji{t) were constant, or replaced by an 
upper bound. 

Remark The RMT ensures that the actual length tAx does not exceed 
the 1/uj region in which J(x^) is a valid approximation of J{x) according 
to the definition of col 

Similar to Lemma 3, one can show more generally, that 

Lemma 4 
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where = sup 

0<5<t 






Lemma 5 Assume we choose r] < 1 and such that 

iV(i^)||Ax'=|| <77< 1. 



( 9 ) 



Then 

for all A 6 {J{x^ + sAx^)~^ | s < t*}. 

Proof. Let Jo = J{x^) and A — J(x^ + sAa:^)~^ for some s,Q < s <t^. 
By definition 



0^2 {t^) = sup 

0<T<i^ 



\\A {J{x^ + tAx^) - Jo) II II Jo ^F{x^) 



s||^F(x^)|| 






< SUP f ll-4^oll l|Jo-M-^(^^+^Ax^)--^o)ll \\Jo^A-^\\ ||>in^^)H \ 

- o<r|t^ V spP(a:^)|| ||Jo-'Fp)|| J 

< a;x(p||^Jo|| ||PJo)-'|l- 

Moreover, the choice of A, the definition of u>i{t^) and condition (9) 
imply 

||Jo-'p-i-Jo)||<^a;i(s)||Ax'=||<r?<l, 

which gives the classical estimates 

II^Joll = ||(/- Jo-n^o-^-'))“'|| < 

||Jo-'A-l|| = ||/ + J-l(A-l-Jo)|| < 1+7?. 

It follows that 

1 - T] 

so another application of (9) provides the required result. □ 



Lemma 4 and Lemma 5 imply that, if rj is chosen to satisfy 
7?^^ < 2, i.e. 7? < ^ (^yi7 - 3^ , 
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then all level functions for intermediate choices of Jacobians also de- 
scend. In particular, two-cycles are impossible. For example, rj = 1/2 
yields the property 



Lemma 6 (No Two-Cycles when r] = 1/2) For all k 

Ofk 

hence 



(10) 

( 11 ) 



Ofk-\-\ -j-k 

so that 7 ^ 



Proof. Due to the choice rj = 1/2, inequality (10) follows immediately 
from (8), and inequality (11) from Lemmas 4 and 5 in the case A = 
J(x^+i)-L □ 



Note, however, that this result still does not prove global convergence. 



5. PRACTICAL REALIZATION OF THE 
RMT 



The costly evaluation of cui (t) can be avoided. In the quadratic upper 
bound of Lemma 3, one can replace u>i{t) by the weaker estimate for the 
curvature 



i^3(i) 



2|| J(x^)-i {F{x'^ + tAx'^) - (1 - t)F{x>^)) II 

t2||Ax*||2 

2|| J(x*^)~^ Jq (J(x* + sAx^) — J(x*^)) Ax*^ ds|| 
^2||Ax^||2 



The estimate cu 3 (t) is easy to evaluate. Indeed, it involves only the 
calculation of 



Ax'^ = -J(x'=)-^F(x^), 
which is necessary anyway, and of 

Axk = -J(x'')-^F(x^ + tAx'=), 

just as in a line search procedure for the natural level function. However, 
instead of a one dimensional minimization, and analogously to the more 
restrictive test (7), we require the conditions 

7?* < i"o;3(t'=)||Ax'=|| < 



( 12 ) 
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with rj^ < rj < rf . In the numerical tests of Section 9, 77 = 1, 77 ^ = O .877 
and rf = I .277 were used. 

As CJ 3 {t) is continuous, a simple rootfinding procedure for tcjs (t) 1 1 1 1 

— 77 = 0 is applied to satisfy ( 12 ). A good starting value for is provided 
by the curvature estimate of the previous iteration, namely 

j-k /^i ^ 

^start •= u 3 {t>^-^)\\Ax'^\\ 

Thus, according to our experience, at most two F-evaluations per iter- 
ation are required. 

This restrictive monotonicity test works very well in practical appli- 
cations. Although the rigorous proof of Lemma 6 does not hold for 
the weaker curvature measure used here, cycling does not occur for the 
Ascher-Osborne example, and has not been observed in practical ap- 
plications. It is hoped that a similar proof of non-cycling, possibly for 
sharper 77 , can be found. 

Nevertheless, we keep in mind that even Lemma 6 does not provide a 
global convergence proof for either version (7) or (12) of the RMT based 
on descent arguments. 

6. A DIFFERENT INTERPRETATION OF 
THE RMT 

Let us consider, instead of a single mapping F, a family H : F x [0, 1] G 
j^n depending on a parameter A such that 

i^(x^0) -0, JT(x,l) -F(x), 

where is some initial solution (A = 0). For example, we can set 

H{x, A) = F(x) - (1 - A)F(x^). (13) 

Let us consider the equation 

H{x{X),X)=0, Ae[ 0 ,l]. (14) 

Under certain eissumptions (see [11]), e.g. those of Lemma 1 , (14) defines 
a unique continuously differentiable function a; (A) (homotopy path), and 
x(A), A G [0,1], satisfies the implicit, so-called Davidenko differential 
equations [7] 

' dH{x{X),X) 
dx 






dH{x{X),X) 
dX 



VA G [0, 1], x(0) = 
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For the choice (13), the last expression takes the form 

X = -J{x)-^F{x^) = --^J{x)-^F{x), VA e [0, 1], x(0) = (15) 

Let us introduce the change of variable A = 1 — e~'^. Then r varies from 
0 to +00 as A varies from 0 to 1, and (13) has the form 

H{x, r) = F{x) - e-^F{x^), r G [0, +oo). (16) 

The differential equation corresponding to (16) is given by 

^ = -J{x)-^F{x), Vr G [0, +oo), x(0) = (17) 

dr 

We can consider the problem of constructing the trajectory x(A) of (15) 
(or x{t) of (17)) as numerical integration of the ODE (15) (or (17)). If 
we integrate (17) by Euler’s method with stepsizes we obtain 

^k+i A: - 0, 1, 

which is the damped Newton method. Thus, we can view the damped 
Newton method as an Euler approximation of the continuous Newton 
equation (17). 

Next we will show that the RMT is nothing but a stepsize control for 
Euler’s method applied to the Davidenko differential equations. First 
we estimate the local integration error. For simplicity we only consider 
the first iteration step. 

Lemma 7 The local integration error of Euler’s method applied to (15) 
is given by 

e(t) := -J{x\^ {F{x^ + tAx°) - (1 - t)F{x^)) + 0{t^). 

Proof. The local error is defined as 

e{t) = x{t) ~ x^ ~ 
where x{t) satisfies the invariant 

F{x{t)) = {l-t)F{x^). 

Prom a Taylor series expansion of (18) we have 
i(0) = -J{x°)-^F{x^) = Ax^, 
f(0) = — J(a;°)“^ ^-^J{x^)fj X 

= - (1 - t)F{A) + 0(t), 



( 18 ) 
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and x[t) = ^Ax^ + yx(0) + O(t^), which imply the required result. □ 



From the new point of view, the RMT 



(19) 



is simply a stepsize control for Euler’s method. By Lemma 7 the term 
controlled in formula (19) is an asymptotically correct estimate of the 
local integration error. It is kept small compared to the increment norm, 
which is controlled by the choice of 77, in order to ensure that the Newton 
path is followed with a desired accuracy. 

We can go one step further, if we take into account that (15) is an 
implicit ordinary differential equation with known invariant given by 
equation (18). Similar to techniques used in discretization methods for 
ODE or DAE with invariants, e.g. [13], we can therefore exploit the 
invariant for a stabilization step, which is a “back projection” of 

+ tAx^ 



to the invariant manifold, which is a curve in our case. This step can 
be performed by adding the correction term already computed for the 



RMT 



— x^^^ + Ax^ = x^ PtAx^ + Ax^, 

Ax^ -J{x\^F{x^ + tAx^)-{l-t)F{x^)), 



Unlike Euler’s method, which is of first order, the combined two step 
method is of second order. Note, that as soon as = 1 is reached, 
the additional back projection step extends the quadratically convergent 
Newton’s method to a well-known cubically convergent modification. 

In terms of Newton’s method, the combined scheme can be interpreted 
as the first two steps of a simplified full step Newton method to solve 
F{x) — (1 — t)F(x^) = 0, starting from x^. The RMT then plays the 
role of a monitoring test to choose t sufficiently small, in order that 
contractivity (by 77^/2) of this scheme is guaranteed. This projection 
could of course be repeated until the Newton path is met with a desired 
accuracy. A natural and necessary extension of the RMT is then to 
check sufficient contractivity of the iterations e.g. 



||J(xO)-i(f(x 1'0 - (1 -<)i^(x°))|| ~ 2 



7?* < 7?** < 2. 
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If insufficient contractivity occurs, then starting over with reduced ry 
(hence t) seems to be preferable to additional damping during “back 
projections” or to re-computing the Jacobian. 

Remark. It can be shown that the stepsize strategy given here eventu- 
ally leads to a full step method when the local convergence conditions 
of Theorem 1 are satisfied. 

7. VARIATIONS OF THE DAMPED 
NEWTON METHOD 

The interpretation as an error controlled integration method for an 
ODE with invariants allows various modifications and variations of the 
basic method. 

Basic strategy 

The basic strategy, which was used for the numerical computations 
presented below, is as follows: 

1 compute the Newton direction 

2 compute = x^ + t^Ax^^ where satisfies the RMT as error 
control, 

3 restart the homotopy path (equivalently, continue integration) from 

Basic strategy with back projections 

A more expensive strategy which needs one (or more) additional F- 
evaluation(s) is: 

1 compute the Newton direction Ax^, 

2 compute = x^ + t^Ax^^ where satisfies the RMT, 

3 add one (or more) back projection step(s) 

^/c+1,2+1 ^ + 

- (1 - , 

4 restart the homotopy path from the last 

In this variant we first have to ensure local convergence to the New- 
ton path using the convergence behaviour of the back projections for a 
reduction strategy for ry, hence also for the stepsizes t. From this then 
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follows global convergence under certain assumptions like nonsingular- 
ity of Jacobians along the Newton path, since along this path all level 
functions ||ylF(x)|| 2 , with A nonsingular, descend by a factor of 1 — 

A termination criterion could be based on the latter property. In our 
numerical tests, however, repeated back projections were not found to 
be superior to the basic strategy. Apparently, the extra effort to iterate 
back to the continuous Newton path does not necessarily lead to a better 
iterate even though it guarantees global convergence. 

Similar to techniques used in discretization methods for ODE with 
invariants, one can of course consider constructing higher order methods 
for integration of the Newton path. A few comments from the numerical 
ODE point of view may be made. 

1 Since a highly accurate solution of the Newton path is unlikely to 
be necessary except maybe in extreme cases of ill-conditioning, low 
order methods should be most efficient. 

2 The Davidenko equation is an implicit ODE and should be treated 

as such. In order to avoid frequent expensive unnecessary and 
possibly inaccurate evaluations of the use of higher order 

Runge-Kutta methods is not recommended. 

3 Since back projection to the invariant curve effectively inhibits 
error propagation, consistency error considerations are sufficient 
for the construction of integration methods. Suitable candidates 
may be multistep methods based on solution values and occasional 
derivative evaluations. 

A modification worth investigating may be to vary the steps 
with the index i, e.g., as in the second order variant 

^^k+ho ^ - J(x^)-1 - (1 - , 

xk+-^^ = X* + 

8. EXTENSIONS TO X 2 - AND 

Xi-PARAMETER ESTIMATION 

Parameter estimation problems in dynamic processes can be expressed 
as nonlinearly constrained optimization problems in the general form 

||Fo(x)||„ z/€{1,2}, (20) 

Fi(x) = 0, F 2 (x)> 0 , 



mm 

X 
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where the cost functional is the I 2 - or /i-norm of the vector function 
Fo(o:). The vector of variables x = {y^p) consists of “state” variables y G 
R^y , typically discretization variables for underlying initial or boundary 
value problems in ODEs or DAEs, and unknown parameters p to be 
estimated. 

Traditionally, the problem (20) is solved by the constrained Gauss- 
Newton method [3], according to which a new iterate is given by 

^fc + l 



where the increment Ax^ solves the linearly constrained problem 

min ||Fo(a:'') + Jo(cc'')Aa;^||^, u e {1,2}, (21) 

X 

Fiix'^) + Ji(x'^)Ax'^ = 0, 

F2(x'^) + J2(x^)Ax'‘ > 0 . 



In both cases the solution Ax^ of (21) can be represented in the form 
Ax* = — J'*'(x*)F(x*), where J(x*)“^ is a generalized inverse, i.e. it 
satisfies the condition 



J+JJ+ = j+, 




F = 



Fo 

Fi 

F2 



For example, for the unconstrained / 2 -case, J^{x^) is the Moore- 
Penrose inverse of J(x*). The /i-solution interpolates some of the rnea- 
surements, i.e. some components of the linearized function 

Fq{x) + Jo{x)Ax 

are equal to zero (“active”). Under certain conditions, the generalized 
inverse of J is then the inverse of a projection of J(x^), which contains 
the active measurements, equality constraints and active inequality con- 
straints. 

Using the generalized inverse, the Gauss-Newton method becomes 

xk+i + f*Ax*, Ax* = - J+(x*)F(x*). (22) 

With these preparations, we can formally extend our quadratic upper 
bound and restrictive monotonicity test to \\J~^{x^)F{x^ -h^Aa;^)||. The 
Gauss-Newton method (22) can be then interpreted as a stepsize control 
for the Euler method applied to the continuous Gauss-Newton method. 
Note, however, that the active inequality constraints incorporated in 
J~^{x^) as well as the active measurements are changing along the so- 
lution of the continuous problem, so that the stepsize strategy must be 
combined with an additional monitoring of the changing active sets. 
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9. NUMERICAL RESULTS 

With the new stepsize strategies two challenging test problems were 
treated. The optimal control problem of the re-entry of an Apollo space- 
craft is known for its hard nonlinearities, due to the aerodynamic forces 
and has a very small region of feasible solutions [14]. In the estimation 
of the reaction constants in the nonlinear differential equation modelling 
the denitrogenization of pyridine, ill-conditioning and complicated sta- 
bility problems for poor initial guesses of the parameters occur. The 
results show the potential of the new approach. 



9.1. RE-ENTRY PROBLEM [14] 

In this problem a control has to be chosen to minimize the heating 
of a space vehicle during the flight through the earth’s atmosphere on 
the way back from the outer space. Numerical difficulties are caused 
by extreme instability properties due to the aerodynamic forces when 
entering the atmosphere. Convergence using Newton’s method can be 
expected only if the initial guess is fairly close to the solution. 

Applying the maximum principle to this optimal control problem re- 
sults in a boundary value problem in the states u, 7 , (^, the adjoints A-y, 
A^, A^ and the free final time T: 



i) 



7 

e 



A^ 



A 
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gsin7 



2m 
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(1 + 0^ 
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V sin 7 






V cos 7 
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with boundary conditions 

^;(0) - 0.36, 7(0) = -S.Itt/ISO, ^(0) - 4/i?, 

v(T) = 0.27, 7(T)=0, e(0) = 2.5/i?, 

-l0v{Tf^ + v{T)X,{T) + j{T)Xj{T) + i{T)X^{T) = 0, 

where po = 0.002704, R = 209., /? = 4.26, Cw{u) = 1.174 — 0.9 cos u, 
= 0.6 sin u, S/2m = 26.600, p = 3.2172 x 10“^, and the control u 
is given by 

0.6A^ 0.9^;A-y 

sin?i = cosu = , w = \/ (0.6A^)^ + (0.9^’A-y)2. 

w 

We parametrized this boundary value problem with a multiple shoot- 
ing technique, using 6 equidistantly distributed nodes. The initial guesses 
were generated according to a technique described in [14]. For the so- 
lution of the resulting system of nonlinear equations with 37 variables 
Newton’s method using the basic stepsize strategy was used. The so- 
lution (relative accuracy 10"“^) was achieved after 8 iterations, 4 with 
damped steps. Figure 2 shows the stepsizes in every iteration. 

It is difficult to compare these results with ones documented in the 
literature [14], [8], [9], because there Broyden approximations and finite 
difference approximations were used. One may say, however, that our 
results are very competetive with the fastest results so far published. 




iteration 

Figure 2 Stepsize history for the re-entry problem 
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9 . 2 . PARAMETER ESTIMATION IN THE 

DENITROGENIZATION OF PYRIDINE 

This problem (originally due to Zwaga [15]) was investigated in [3]. 
At first only pyridine is present which initiates a reaction process that 
can be described by ODEs with 7 state variables, the concentrations of 
the species: 

Pyridine: A = -piA + pgB 

Piperidine: B = piA - P 2 B - psCB + pjD - pgB + PioBF 

Pentylamine: C = P 2 B - p^BC - 2p4,CC - peC + psE 

+P 10 DF + 2piiEF 

N-Pentylpiperidine: D — p^BC — p^D — pjD — pioDF 

Dipentylamine: E = p^CC + p^D - psE - pnEF 

Ammonia: F = p^BC + p 4 .CC + p^C - pi^DF - puEF 

Pentane: G = p^C F pjD + psE. 

The rate constants pi, p 2 ^ . . . ,pn of these ODEs are unknown and have 
to be estimated from 77 measurements of the states at the times 0.5, 1, 

..., 5.5. 

We treated this parameter estimation problem with the multiple shoot- 
ing code PARFIT [3], which has a generalized Gauss-Newton method 
as a core routine for the solution of the structured constrained least 
squares problems. For globalization we implemented the basic strategy 
(77 — 1), replacing the inverse by the generalized inverse, that is the 
solution operator for the constrained linear least squares problems. 

We performed 8 experiments with widely varying initial guesses for 
the 11 parameters. As initial guesses for the states we chose the mea- 
surements. The initial guesses for the parameters = Of, z = 1, ..., 7, for 
different a were rather poor guesses, since the true values vary between 
0.201 and 29.4. In all cases the algorithm safely converged. The history 
of stepsizes is shown in Figure 3. The number of iterations, the number 
of damped steps and the achieved accuracy are given in Table 1. 



10- CONCLUSIONS 

It is well known that Newton’s method for nonlinear equations can be 
forced to converge globally in a domain where the Jacobian is nonsingu- 
lar. However, the price one has to pay is unnecessarily small stepsizes 
in the local convergence domain of the full step method if the problem 
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iteration 



a = -0.1 



a = 0 





a = 0.01 



a = 0.1 



Figure 3 Stepsize histories for the pyridine problem 
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661.18 


1.28x10'^ 
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0.1 


40.09 


5.41 xlO-^ 
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0.2 


16.70 


7.82x10-® 
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2.24 


3.76x10-® 
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5.0 


2.65 


3.68x10"® 


17 


11 


10.0 


66.83 


7.64x10"® 


27 


22 



Table 1 Convergence behaviour of the Gauss-Newton method with RMT for the 
pyridine problem for different initial guesses 



is mildly ill-conditioned and nonlinear and one chooses classical merit 
functions. 

The paper presents a new strategy which combines previously defined 
“natural level functions” with a restrictive monotonicity test. The new 
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a = 0.2 



a = 1 




iteration 



iteration 



a = 5 q; = 10 

Figure 3 (continued) Stepsize histories for the pyridine problem 



global convergence argument is quite different from the classical descent 
type proof. It is shown that this combination of natural level functions, 
for which descent proofs do not hold, and the restrictive monotonicity 
test overcome the problem of choosing too small steps. The new step- 
size strategy is viewed as a stepsize control for the continuous Newton 
method which makes use of an invariant of the Newton path. 

Three practical interpretations can be given. The first interpreta- 
tion is a stepsize control of the continuous Newton method by means 
of an asymptotically correct estimate of the local error. Secondly, we 
interprete the damped Newton method as an attempt to solve a relaxed 
problem with a full step Newton method (with Jacobian kept constant), 
and the RMT is a test on sufficient contractivity. Thirdly, the stepsize 
is allowed to go as far as the approximation of the Jacobian is valid. 

The second argument suggests generalizations to other stepsize and 
trust region strategies. Most of them can be interpreted as attempts to 
solve a relaxed version of the original nonlinear problem. In the spirit of 
this paper a stepsize (or trust region) strategy can be based on a control 
whether a sufficient (local) contraction of the method to the solution of 
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the relaxed problem is achieved. It is hoped that this approach proves 
to be equally effective in these other areas. 

Numerical results to two demanding problems from optimal control 
and parameter estimation in ODE are given, which are notorious for their 
strong nonlinearities. They show a very nice and reliable convergence 
behaviour. 
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Abstract A parallel implementation of the specialized interior-point algorithm 
for multicommodity network flows introduced in [6] is presented. In 
this algorithm, the positive definite systems of each iteration are solved 
through a scheme that combines direct factorizations and a prec:on- 
ditioned conjugate gradient (PCG) method. Although this numerical 
procedure works well in practice, it requires the solution of at least k 
systems of equations at each iteration of the PCG, k being the number 
of commodities to be routed through the network. 

In order to reduce the time spent by the PCG method, we propose 
the application of coarse-grained parallel strategies for computing tfie k 
linear systems of equations at each PCG iteration. Since the number of 
arithmetic operations to be performed for each commodity is the same, 
the load balancing between processors is guaranteed, which avoids un- 
necessary delays. An extensive set of computational results on a shared 
memory machine are presented, using problems of up to 2.5 million 
variables and 260,000 constraints. For the largest PDS (Patient Distri- 
bution System) problems, the efficiency of the parallel implementation 
developed is about 80%, which confirms that it can be a promising tool 
for very large and difficult multicommodity instances. 

Keywords: interior-point methods, linear programming, multicommodity network 
flows, parallel computing. 
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1 . INTRODUCTION 

Multicommodity flows are one of the most challenging problems for 
linear programming solvers. This is partly due to the large size of these 
models in real world applications (e.g., routing in telecommunications 
networks). The need to solve very large multicommodity instances has 
led to the development of both specialized algorithms and parallel im- 
plementations. In this work we introduce a parallel implementation of 
a specialized multicommodity interior-point algorithm. The implemen- 
tation has two main features. From the multicommodity point of view, 
it is not based on a decomposition approach, and thus it does not fol- 
low the master-slaves (or coordinator-subtasks) parallel scheme. From 
the interior-point point of view, unlike other parallel interior-point codes 
[4, 8, 15], the parallelization is not focused on the Cholesky factoriza- 
tion to be performed at each iteration — though it could be included — 
but on the parallel solution of smaller subsystems related to the various 
commodities of the problem. 

The block angular structure of the multicommodity problem con- 
straints matrix has led to a number of specialized methods. Among 
the earlier approaches we could mention primal partitioning, and price 
and resource directive decomposition (see [2, 14] for a general descrip- 
tion). Recent variants of price directive decomposition have successfully 
applied bundle methods [10] and analytic centers [11]. Multicommodity 
problems, such as the PDS ones, were also used to test the efficiency 
of the early general interior-point solvers for linear programming (e.g., 
[1]). Attempts to develop specialized interior-point algorithms for multi- 
commodity flows were presented in [13, 20, 6], the latter being the most 
successful. This is the algorithm that will be parallelized in this work. 

Parallel approaches for multicommodity problems have also been wide- 
ly studied in the past. As in the sequential case, the parallel implemen- 
tations make use of several decomposition strategies, such as bundle 
methods [7, 17], linear-quadratic penalty terms [19], and, more recently, 
analytic centers [12]. A discussion of these and other parallel decom- 
position approaches is presented in [7]. A general description of the 
parallelization of mathematical programming algorithms can be found 
in [5] and [21]. 

The paper is organized as follows. Section 2 presents the formulation 
of the problem to be solved. Section 3 outlines the specialized interior- 
point algorithm for multicommodity flows, including a brief description 
of the general path-following method. Section 4 deals with the paral- 
lelization issues of the specialized multicommodity algorithm. Finally, 
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Section 5 gives the computational results obtained with the parallel im- 
plementation developed. 

2. PROBLEM FORMULATION 

In the most general case, the multicommodity network flow problem 
can be stated as how to obtain the best routing (that which involves 
the minimum cost) of a set of k commodities through a network of m 
nodes and n arcs, where the arcs have an individual capacity for each 
commodity, and a mutual capacity for all the commodities. The resulting 
problem can be written as 
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0 < < x^'^\ z = 1, . . . , A: 

0 ^ ^mc — i^mc' 



( 3 ) 

( 4 ) 



Vectors x^'^^ G IR’^ and E are the flow and cost arrays for each 
commodity z, z = 1,...,A:. Smc ^ IR’^ denote the slacks of the mutual 
capacity constraints. E is the node-arc incidence matrix. 

We shall assume that Aat is a full row-rank matrix. This can always 
be guaranteed by removing any of the (redundant) node balance con- 
straints. E IR’^ is the vector of supplies/demands for commodity z 
at the nodes of the network. Constraints (3) are simple bounds on the 
flows, E IR^,z == 1, . . . , A:, being the upper bounds, bmc ^ IR’^' ctre 
the mutual capacities of the arcs for all the commodities. In denotes the 
n X n identity matrix. 

Note that the multicommodity flow problem can be formulated as a 
linear programming one with m = km + n constraints and n = {k -}-l)n 
variables. 
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3. OUTLINE OF THE SPECIALIZED 

INTERIOR-POINT ALGORITHM FOR 
MULTICOMMODITY FLOWS 

The interior-point algorithm for multicommodity flows introduced in 
[6] is a specialization of the path-following algorithm for linear program- 
ming (see [23] for a thorough description). Let us consider the following 
linear programming problem in primal form 

min (P^x 

subject to Ax — 6, 

x + / = x 

^,/ > 0 , 

where x G and / G are the primal variables, x G IR’^ are the 
upper bounds, c G IR^, h G IR"^, and A G IR^^^ is a full row-rank 
matrix. The dual of (5) is 



max b^y — x^w 

subject to A^y + z — w = c (6) 

z, tc > 0, 

where y G IR^, ^ G IR’^ and w G IR’^ are the dual variables. 

Replacing the inequalities in (5) by a logarithmic barrier in the ob- 
jective function, with parameter /i, it can be seen that the KKT first 
order optimality conditions of this barrier problem are equivalent to the 
following system of nonlinear equations: 

Txz = XZefi = 

Tfw = iien-FWen = 

= b — Ax = 

rc = c — {A^y z — w) = 

(x, z, w) > 

X > 

where is the n-dimensional vector of I’s, X, Z, F, and W are diagonal 
matrices defined as M G IR^^^ = diag(mi, . . . ^rrin)^ and the vectors 
define the left-hand side terms of (7). Note that we did not include the 
slacks equation x + / = x in (7). Instead we replaced the slacks / by 
X — X (thus, F = X — X in (7)), reducing by fi the number of equations 
and variables. The solutions of system (7) - considering inequalities as 
strict inequalities - for different values give rise to an arc of strictly 
feasible points known as the central path. As /i tends to 0, the solutions 
of (7) converge to that of the original primal and dual problems. A 



0 

0 

0 

0 

0 

X, 



( 7 ) 




Parallel implementation of an interior-point algorithm for network flows 79 



Algorithm path-following(A, 6, c, x, 

1 Initialize where ^ 

2 while ^ is not optimal ^ 

3 Q = {X-^Z + F-^W)-^ 

4 r = - X~^r^z 

Compute direction: 

5 {AQA^)dy = + AQr 

6 dx — @{AFdy — r) 

7 die = F~^(ry^y + VFdx) 

8 dz = rc + dw — AFdy 

9 Update ji 

10 Compute a 

11 ^ i — ^ 4" o d^ 

12 end-while 



Figure 1 Path-following algorithm. 



path- following algorithm attempts to follow the central path. Figure 1 
shows a damped version of Newton’s iteration applied to the nonlinear 
system (7). We use it for the multicommodity specialization. Note that 
the matrix © computed at step 3 is a positive definite diagonal mcitrix, 
because of the way it is formed from positive definite diagonal matrices. 
A more comprehensive description of the algorithm can be found in [23]. 

The main computational burden of the algorithm is the solution of 
the positive definite system 

{AeA^)dy = b (8) 

at step 5 of Figure 1 (6 in (8) denotes the right-hand side ri) + AQr of 
the system). General interior-point codes attempt to solve (8) through a 
Cholesky factorization LLF — P{AQA^)P^ ^ where P denotes a permu- 
tation matrix obtained by some heuristic. However, even for such good 
permutation matrices as those obtained by the minimum degree order- 
ing or minimum local fill-in heuristics, when A is the multicommodity 
constraints matrix defined in (2), the Cholesky factorization LL^ turns 
out to be fairly dense, making this procedure computationally expensive. 
This is shown in Figure 2, in which the sparsity patterns of both A and 
L + LF are depicted for a multicommodity problem with 64 nodes, 524 
arcs and 4 commodities, using the state-of-the-art interior-point code 
BPMPD [16]. 
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(a) (b) 

Figure 2 (a) Sparsity pattern of a multicommodity constraint matrix, 

(b) Sparsity pattern of the factorization of P(ASA^)P^ . 



The specialized interior-point method suggested in [6] considers the 
structure of A presented in (2), and the following partitioning for the 
diagonal matrix 0 




where G and &rnc ^ are related to the flows of 

commodity i and the slacks Smc respectively. It is straightforward to see 
that the structure of A&A^ is 




where B G is the block diagonal matrix 

B = diag(AAT0W i = (11) 

each block being a square matrix of dimension m, where C G is 

defined as 

C'=[©(i)A^ ... , (12) 

and where D € IR”^” corresponds to the lower diagonal submatrix of 

AQA^: 

k 

Z=1 



( 13 ) 
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Since 0 is diagonal and positive definite, it follows that D is also a 
positive definite diagonal matrix. 

Using the above structure of A&A^^ and partitioning vectors dy and 
b accordingly, the solution of (8) is reduced to 

{D-C'^B-^C)dy2 = = /?2 (14) 

Bdy, = {h-Cdy 2 ) = Pi, (15) 

where P 2 ^.nd /?i denote the right-hand sides of (14) and (15) respectively. 
The matrix 

S = D- C'^B-^C (16) 

is known as the Schur complement. To solve (14) and (15) efficiently, 
we only need to deal with systems involving B and S. Systems with the 
matrix B can be decomposed into k smaller ones of dimension m with 
matrices i == 1, . . . , /c, according to (11). 

The system (14) cannot be solved using a direct method (e.g., fac- 
torization of the Schur complement), since this would mean forming the 
matrix S', which is computationally prohibitive. Instead, we suggest us- 
ing a conjugate gradient method, in virtue of the following result (see 
[6] for a proof). 

Proposition 1 The Schur complement matrix S = D — C^B~^C de- 
fined in (16) is symmetric and positive definite at each iteration of the 
path-following algorithm. 

The main drawback of the conjugate gradient method is its slow con- 
vergence, especially when (5) and (6) are close to their solution point 
(the Schur complement becomes more ill-conditioned). It seems more 
reliable to use a preconditioned conjugate gradient (PCG) algorithm. 
The preconditioner that will be used consists of an approximation of the 
inverse of S, and it is based on Proposition 2. A proof of this result can 
be found in [6]. 

Proposition 2 The 'inverse of S = D — B~^C can be computed as 

00 

s-^ = {Y,{D-^QY)D-\ (11) 

i—Y) 

where 

Q = C'^B-^C. (18) 



The preconditioner is then obtained by truncating the power series (17) 
at the term with index i = 0, say. Clearly, the higher (f the better the 




82 Jordi Castro 



preconditioning, and the fewer iterations of the PCG will be required. 
However, each new term in the preconditioner, after the first one, means 
solving one additional system with matrix 5, which increases the cost 
of each PCG iteration. Therefore, we must balance two objectives: re- 
ducing the number of PCG iterations and the number of systems to be 
solved. Several numerical experiments have shown that the best results 
are obtained for 0 = 0 (in this case the preconditioner is thus being 
diagonal) and, in some problems, for </> = 1. The algorithm uses (p = 0 
as the default value. The extensive computational experience reported 
in [6] proved the efficiency of this specialized interior-point algorithm. 

4. PARALLELIZATION OF THE 
ALGORITHM 

Computing the direction of the dual variables dy at step 5 of the 
path- following method in Figure 1 is by far the most costly procedure 
to be performed by the specialized multicommodity algorithm. Figure 
3 summarizes the steps required to compute dy, according to (14) and 
(15). Looking at Figure 3 we see that all of the steps require either 
a factorization of S, or a backward and forward substitution with this 
factorization, or products of vectors with matrices C or . In fact, 
these will be the only four procedures to be run in parallel, following a 
coarse-grained scheme. Considering the partitioning of B and C defined 
in (11) and (12), these four procedures are implemented as follows: 

1 Factorization of B. Perform in parallel the k factorizations of 

Aj^. Note that the current implementation uses sequential 
Cholesky solvers. Using parallel implementations of Cholesky de- 
compositions, such as those described in [4, 8, 15], each of the k 
factorizations could itself be performed in parallel, improving the 
efficiency of the code. 

2 Solution of system Br — s, for any s G Solve in parallel for 

each diagonal block of B. 

3 Computation ofw = Cv^ for any v G Using (12), we compute 

in parallel i = 1, . . . , A:, so has the components 

of w related to commodity i. 

4 Computation of v = for any w G Using (12), we 

compute in parallel the temporary vectors — 

l,...,/c. We then add the temporary vectors sequentially [v = 

obtaining v. Due to its low computational cost, the 
addition of the k vectors has not been parallelized. 
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Procedure AQA^dy = 0, 6, dy) 

1 Factorize the k blocks of B 

2 Compute /?2 = ^2 ~ C^B~^bi 

3 PCG: Solve {D - C^B~^C)dy 2 - /?2 

3-1 while dy 2 is not optimal dp 

3-i Compute w = {D — C'^ B~^C)v 

3.n end while 

4 Compute = bi — Cdy 2 

5 Solve Bdyi = (3\ 

6 Return: dy = {dy'f dy^)^ 



Figure 3 Procedure for computing systems (14) and (15). 



From our computational experience, it can be stated that for large 
problems the above four procedures represent more than 97% of the 
execution time (see Figure 6 in Section 5). This guarantees that the 
fraction of the sequential region will be small enough for it not to be 
a major bottleneck. It should also be noted that, in each of the four 
parallelized procedures, the number of floating point operations for each 
commodity (and thus for each processor) will be the same, which guar- 
antees the load balancing between processors and avoids unnecessary 
delays. 



4.1. PARALLEL PROGRAMMING 
ENVIRONMENT 

The parallel implementation of the multicommodity interior-point al- 
gorithm was developed on a Silicon Graphics 0rigin2000 (SGI 02000) 
server. The SGI 02000 is a shared memory machine, main memory be- 
ing physically distributed across several processors. In addition, each 
processor has a first level cache memory of 64Kb (32Kb for instructions, 
32Kb for data), and a secondary data cache memory of 4Mb (for both 
instructions and data). 

The main advantage of using a shared memory machine such as the 
SGI 02000 is the ease with which an existing sequential code can be 
ported, in comparison with distributed parallel environments. The lat- 
ter require the use of one of the message passing communication stan- 
dards (e.g., MPI or PVM), whereas the SGI 02000 provides a more 
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user-friendly system based on including special directives in sequential 
C or Fortran codes [22]. These directives, usually located at the be- 
ginning of loops, create different threads of execution that will run in 
parallel different sections of the iterative region. Moreover, unlike dis- 
tributed systems, which force the programmer to allocate data struc- 
tures between processors and to keep communication low, the parallel 
environment of the SGI 02000 automatically attempts to perform these 
tasks. The default data distribution provided, however, can result in an 
excessive number of cache misses and page faults from the local memory 
of each processor, the performance of the parallel executions thus be- 
ing severely impaired. Although advanced directives enable this feature 
to be controlled, the computational results presented in Section 5 were 
obtained with the default data distribution across processors provided 
by the system. This default distribution was also used in [4]. Further 
details about the use of the parallel directives of the SGI 02000 can be 
found in [22]. 

4,2, PERFORMANCE MEASURES 

The performance measures presented below will be considered in Sec- 
tion 5 when reporting the computational results obtained. All of these 
performance measures are widely used in the field of parallel computing 
[5]. Considering a particular parallel implementation of an algorithm, 
we will denote the execution time obtained with p processors by Tp. The 
speedup Sp obtained with p processors can thus be defined as 




The fraction of the total execution time consumed in the sequential 
version by the parallel region will be denoted by /. Values of / close to 1 
guarantee good theoretical speedups, whereas the bottleneck represented 
by the sequential region increases with 1 — /. This is summarized by 
Amdahl’s law, which provides a theoretical upper bound Sp for the best 
possible speedup 



^" ■//?+(!-/) 



< 



1 

(T^‘ 



Finally, we can define the efficiency with p processors as 
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The efficiency represents the fraction that a particular processor (of the 
p available) is usefully employed during the execution of the algorithm. 
Note than when / = 1, we have Sp = p and Ep — 1. 

5. COMPUTATIONAL RESULTS 

The sequential code of the algorithm outlined in Section 3 was im- 
plemented and named IPM in [6]. The parallel version developed in 
this work will be denoted as pIPM. It is written mainly in C, with only 
the Cholesky factorization routines (devised by E. Ng and B. Peyton 
[18]) coded in Fortran. Both the sequential and parallel versions can be 
freely obtained for academic purposes from http://www-eio.upc.es/ 
castro/software .html. All the runs were carried out on the SGI 
0rigin2000 server located at the European Center for Parallelism of 
Barcelona (CEPBA), running an IRIX64 6.5 Unix operating system. 
The main characteristics of the server are shown in Figure 4, as reported 
by the hinv (hardware inventory) command. This computer appears at 
position 275 of the TOP500 supercomputer sites list [9]. 



64 250 MHZ IP27 Processors 

CPU: MIPS RIOOOO Processor Chip Revision: 3.4 

FPU: MIPS RIOOIO Floating Point Chip Revision: 0.0 

Main memory size: 8192 Mbytes 

Instruction cache size: 32 Kbytes 

Data cache size: 32 Kbytes 

Secondary unified instruction/data cache size: 4 Mbytes 



Figure 4 Characteristics of the SGI 0rigin2000 server used for the executions. 



Two sets of multicommodity instances were used for the computa- 
tional experiments. The first is made up of 18 problems obtained with 
Ali and Kennington’s Mnetgen generator [3]. Table 4 shows the di- 
mensions and optimal solutions of the Mnetgen problems. The param- 
eters used to generate the instances can be found in [10], and can be 
retrieved from http://www.di.unipi.it/di/groups/optimize/Data/ 
MMCF.html#MNetGen. Columns “m”, “n”, and “fc” show the number of 
nodes, arcs, and commodities. Columns “n” and “m” give the number of 
variables and constraints of the linear problem (where n — (/c + l)n and 
m = km-\-n). Finally, column gives the exact optimal objective 

function value. For the last two problems no exact objective value has 
been computed (an approximate solution obtained with IPM is reported 
in Table 3). 
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Problem 


m 


n 


k 


n 


rh 


C^X* 


Mi28-8 


128 


1089 


8 


9801 


2113 


1924133.9 


M 128 -I 6 


128 


1114 


16 


18938 


3162 


4145079.4 


Mi28-32 


128 


1141 


32 


37653 


5237 


9785961.1 


Mi28-64 


128 


1171 


64 


76115 


9363 


19269824.2 


Mi28-128 


128 


1204 


128 


155316 


17588 


40143200.8 


M 256-8 


256 


2165 


8 


19485 


4213 


9919483.2 


M 256 -I 6 


256 


2308 


16 


39236 


6404 


20692883.7 


M 256-32 


256 


2314 


32 


76362 


10506 


45671076.1 


^256-64 


256 


2320 


64 


150800 


18704 


92249381.1 


M 256 -I 28 


256 


2358 


128 


304182 


35126 


190137259.9 


M256-256 


256 


2204 


256 


566428 


67740 


397882591.3 


M 512-8 


512 


4373 


8 


39357 


8469 


46339269.9 


M 512 -I 6 


512 


4620 


16 


78540 


12812 


96992237.2 


M 512-32 


512 


4646 


32 


153318 


21030 


192941834.8 


M 512-64 


512 


4768 


64 


309920 


37536 


412943158.7 


M 512-128 


512 


4786 


128 


617394 


70322 


828013599.8 


M 512-256 


512 


4810 


256 


1236170 


135882 


— 


M 512-512 


512 


4786 


512 


2455218 


266930 


— 



Table 1 Dimensions and optimal solutions of the Mnetgen problems. 



The second set consists of ten of the PDS (Patient Distribution Sys- 
tem) problems. These problems arise from a logistic model for evacuat- 
ing patients from a place of military conflict. They can be retrieved from 
http://www.di .unipi . it /di/ groups/ opt imize/Dat a/MMCF .html#Pds. 
Their dimensions and optimal objective functions can be found in Table 
1. The meaning of the columns is the same as in Table 4. 

Before performing all the executions, we studied in detail the perfor- 
mance of pIPM in two particular instances, M 128-128 for the Mnetgen 
and PDS30 for the PDS problems. The behavior of the code with these 
two instances turned out to be fairly representative of the general behav- 
ior for each data set. Figure 5 shows the results obtained. Each plot gives 
the execution time Tp (left-hand vertical scale) and the theoretical best 
speedup Sp^ observed speedup and observed efficiency Ep (right-hand 
vertical scale) for different numbers of processors (horizontal axis). Al- 
though both problems have a similar / value (0.88 for M 128 - 128 , 0.92 for 
PDS30), pIPM behaved very differently in each case. For M 128-1285 the 
gap between Sp and Sp increases with the number of processors, whereas 
for PDS30 the best theoretical speedup is almost always achieved. This 
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Problem 


m 


n 


k 


h 


fh 


c^x* 


PDSl 


126 


372 


11 


4464 


1758 


29083930523.0 


PDSIO 


1399 


4792 


11 


57504 


20181 


26727094976.0 


PDS20 


2857 


10858 


11 


130296 


42285 


23821658640.0 


PDS30 


4223 


16148 


11 


193776 


62601 


21385445736.0 


PDS40 


5652 


22059 


11 


264708 


84231 


18855198824.0 


PDS50 


7031 


27668 


11 


332016 


105009 


16603525724.0 


PDS60 


8423 


33388 


11 


400656 


126041 


14265904407.0 


PDS70 


9750 


38396 


11 


460752 


145646 


12241162812.0 


PDS80 


10989 


42472 


11 


509664 


163351 


11469077462.0 


PDS90 


12186 


46161 


11 


553932 


180207 


11087561635.0 



Table 2 Dimensions and optimal solutions of the PDS problems. 




(a) (b) 

Figure 5 Behavior of the algorithm with the (a) M128-128 problem. 

(b) PDS 30 problem. 



fact, together with the different maximum number of processors used in 
the two problems (64 vs. 11), gives rise to efficiencies of = 0.04 for 
Mi 28-128 (the best possible value was £^54 = 0.12) and En = 0.51 for 
PDS30 (£11 = 0.56). It can also be observed that the execution time Tp 
decreases for PDS30 with p, whereas for M 128-128 it remains almost the 
same for 8, 16, and 32, and slightly increases for 64 processors. This lack 
of scalability of a shared memory machine when using a large number of 
processors was also stated in [4]. However, we believe that these results 
can be improved by exploiting the data distribution between processors, 
as suggested in Subsection 4.1. This additional work remains to be done. 

On the basis of the results in Figure 5 we decided to execute the 
Mnetgen problems with 8, 16, and 32 processors (always guaranteeing 
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k >p), whereas 6 and 11 where used for the PDS ones. Tables 3 and 4 
show the results obtained for the two sets of problems. Column 
gives the optimal solution computed by pIPM. The relative error with 
respect to the exact optimal solutions of Tables 4 and 1 ranges between 
10“^ and 10“^ for all the cases. Column / gives the / value (fraction 
represented by the parallel region in the sequential version). Column 
p is the number of processors used in the execution. Tp denotes the 
execution time. For p = 1, this time means CPU time, as reported by 
the times Unix command, and was obtained by executing the instances 
on a single processor to reduce context switches. For p > 1, denotes 
wall-clock time, and was obtained by executing pIPM alone on the server 
to improve the accuracy of the time measures. Columns Sp and Ep give 
the observed speedups and efficiencies, and, enclosed in parentheses, 
their best theoretical values Sp and Ep respectively. Note that, for some 
of the PDS problems, the observed speedups and efficiencies are greater 
than their theoretical upper bounds. These superlinear speedups can be 
explained by: firstly, a lack of accuracy in the measures of both T\ and 
Tp, p > 1; and secondly, as suggested in [7], a reduction of the number 
of cache misses in the parallel execution with respect to the sequential 
one due to the data distribution between processors. 




(a) (b) 

Figure 6 Fraction / represented by the parallel section in 

(a) the Mnetgen problems. 

(b) the PDS problems. 



Some of the information in Tables 3 and 4 is summarized in Fig- 
ures 6, 7, and 8. Figure 6 shows the evolution of the / value with the 
number of variables of the problem, for both the Mnetgen and PDS 
instances. Clearly, this value increases with the size of the problem, 
and for the largest ones it is greater than 0.97, as stated in Section 4 
above. Accordingly, the bottleneck associated with the sequential ver- 
sion is consistently reduced for larger and larger instances, which results 
in a (theoretical) good behavior of the parallel implementation. 
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Figure 1 (a) Execution times for the M5i2-* problems, 

(b) Efficiencies for the M5i2-* problems. 




(a) (b) 



Figure 8 (a) Execution times for the PDS problems, 

(b) Efficiencies for the PDS problems. 



Figures 7 and 8 show the execution times and efhciencies for the 
M512-* and PDS problems, for different number of processors. For the 
M512-* problems the best improvements are clearly obtained when mov- 
ing from 1 to 8 processors. It is also clear that efficiencies tend to de- 
crease with the number of processors, and that, for any p, they remain 
stable with the size of the problem. These results for the Mnetgen prob- 
lems are not so good as those observed by parallel implementations of 
decomposition approaches for multicommodity flows (e.g., [7]). pIPM 
has a better behavior for the PDS problems. For instance, speedups of 
about 5 are obtained for the largest problems with p = 6, and execution 
times are almost reduced to half when moving from 6 to 11 processors. 
Figure 8(b) shows that, unlike in the M512-* problems, efficiencies for 
6 and 11 processors are almost the same, and that they become better 
with the dimension of the problem. The scalability of the code with the 
PDS problems is not observed, in general, with other parallel implemen- 
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tations of interior-point algorithms using a similar number of processors 
(e.g., [4, 8, 12]). 

6. CONCLUSIONS AND FUTURE 
RESEARCH 

The parallel code pIPM introduced in this work can be an efficient 
and promising tool for the solution of certain types of large and difficult 
multicommodity problems. We have found that it is especially appropri- 
ate for those instances with large networks and few commodities, where 
a small number of processors is required. 

However, it can be improved with many additional refinements, that 
form part of the further work to be done. Among these we would men- 
tion: 



■ The fraction / of the parallel region should be augmented to guar- 
antee better theoretical speedups. This would mean parallelizing 
additional routines while keeping overhead costs low. 

■ It would be worth attempting to use higher order preconditioners 
(e.g., (/) > 0) for the solution of the system with the Schur com- 
plement. Although for sequential executions this would reduce the 
performance of the algorithm, it could augment the fraction / of 
the parallel region, providing better parallel executions. 

■ The gap between the observed and theoretical efficiencies for prob- 
lems with many commodities, such as the Mnetgen ones, should be 
reduced. This could be attempted by considering and exploiting 
the data distribution across the various processors. The reduction 
of the number of cache misses and remote memory access could 
mean improvements by a factor of two. 
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Problem 




/ 


P 


Tp 


5p (Sp) 


Ep (Ep) 


to 

00 

1 

00 


1924113.4 


0.72 


1 


3.1 


1.0 (1.0) 


1.00 (1.00) 








8 


1.7 


1.8 (2.7) 


0.22 (0.34) 


Mi28-16 


4145089.5 


0.86 


1 


16.2 


1.0 (1.0) 


1.00 (1.00) 








8 


7.3 


2.2 (4.1) 


0.27 (0.51) 








16 


7.7 


2.1 (5.2) 


0.13 (0.32) 


Mi28-32 


9785902.8 


0.87 


1 


36.4 


1.0 (1.0) 


1.00 (1.00') 








8 


15.8 


2.3 (4.2) 


0.28 (0.52) 








16 


14.8 


2.5 (5.4) 


0.15 (0.33) 








32 


18.8 


1.9 (6.4) 


0.06 (0.19) 


Mi28-64 


19269830.9 


0.92 


1 


195.4 


1.0 (1.0) 


1.00 (1.00) 








8 


79.9 


2.5 (5.2) 


0.30 (0.64) 








16 


73.9 


2.6 (7.3) 


0.16 (0.45) 








32 


71.9 


2.7 (9.3) 


0.08 (0.29) 


Mi28-128 


40143266.0 


0.89 


1 


395.9 


1.0 (1.0) 


1.00 (1.00) 








8 


150.1 


2.6 (4.4) 


0.33 (0.55) 








16 


137.3 


2.9 (5.9) 


0.18 (0.36) 








32 


138.6 


2.9 (7.0) 


0.08 (0.21) 


M256-8 


9919478.8 


0.88 


1 


19.4 


1.0 (1.0) 


1.00 (1.00) 








8 


7.7 


2.5 (4.3) 


0.31 (0.54) 


M256-I6 


20692714.5 


0.91 


1 


69.0 


1.0 (1.0) 


1.00 (1.00) 








8 


25.4 


2.7 (4.9) 


0.33 (0.61) 








16 


23.5 


2.9 (6.8) 


0.18 (0.42) 


M256-32 


45671345.2 


0.95 


1 


342.0 


1.0 (1.0) 


1.00 (1.00) 








8 


123.5 


2.8 (5.8) 


0.34 (0.73) 








16 


91.1 


3.8 (8.9) 


0.23 (0.55) 








32 


98.0 


3.5 (12.1) 


0.10 (0.37) 


M 256-64 


92249411.9 


0.94 


1 


586.2 


1.0 (1.0) 


1.00 (1.00) 








8 


199.0 


3.0 (5.5) 


0.36 (0.69) 








16 


146.5 


4.0 (8.2) 


0.25 (0.51) 








32 


143.8 


4.1 (10.8) 


0.12 (0.33) 


M256-128 


190138392.4 


0.96 


1 


3352.8 


1.0 (1.0) 


1.00 (1.00) 








8 


627.6 


5.3 (6.2) 


0.66 (0.76) 








16 


558.9 


6.0 (9.7) 


0.37 (0.60) 








32 


511.9 


6.5 (13.7) 


0.20 (0.42) 



Table 3 Results obtained for the Mnetgen problems. 




92 Jordi Castro 



Problem c^a;pipM f P Tp Sp (Sp) Ep {Ep) 



M256-256 397883691.5 

M512-8 46338411.5 

M512-16 96992142.3 

M512-32 192941650.0 

M512-64 412943655.4 

M512— 128 828014985.2 

M512-256 1649358223.7 

M512-512 3487594874.0 



0.96 


1 


7486.0 




8 


1597.0 




16 


1494.0 




32 


1274.5 


0.93 


1 


109.5 




8 


31.9 


0.96 


1 


550.9 




8 


140.9 




16 


111.0 


0.97 


1 


1820.3 




8 


395.1 




16 


318.4 




32 


313.7 


0.97 


1 


4473.1 




8 


1190.8 




16 


896.2 




32 


783.5 


0.98 


1 


18156.2 




8 


4302.4 




16 


3168.2 




32 


2667.0 


0.98 


1 


50178.7 




8 


10745.5 




16 


7982.6 




32 


6346.4 


0.98 


1 


145018.9 




8 


33655.5 




16 


21630.5 




32 


18312.8 



1.0 ( 1 . 0 ) 1.00 ( 1 . 00 ) 

4.7 (6.3) 0.58 (0.79) 

5.0 (10.3) 0.31 (0.64) 

5.9 (14.9) 0.18 (0.46) 

1.0 ( 1 . 0 ) 1.00 ( 1 . 00 ) 
3.4 (5.4) 0.42 (0.67) 

1.0 ( 1 . 0 ) 1.00 ( 1 . 00 ) 
3.9 (6.2) 0.48 (0.77) 

5.0 (9.9) 0.31 (0.61) 

1.0 ( 1 . 0 ) 1.00 ( 1 . 00 ) 

4.6 (6.7) 0.57 (0.83) 

5.7 (11.3) 0.35 (0.70) 

5.8 (17.1) 0.18 (0.53) 

1.0 ( 1 . 0 ) 1.00 ( 1 . 00 ) 

3.8 (6.8) 0.47 (0.84) 

5.0 (11.5) 0.31 (0.71) 

5.7 (17.7) 0.17 (0.55) 

1.0 ( 1 . 0 ) 1.00 ( 1 . 00 ) 

4.2 (7.1) 0.52 (0.88) 

5.7 (12.6) 0.35 (0.78) 

6.8 (20.5) 0.21 (0.64) 

1.0 ( 1 . 0 ) 1.00 ( 1 . 00 ) 

4.7 (7.2) 0.58 (0.90) 

6.3 (13.1) 0.39 (0.81) 

7.9 (21.8) 0.24 (0.68) 

1.0 ( 1 . 0 ) 1.00 ( 1 . 00 ) 

4.3 (7.1) 0.53 (0.88) 

6.7 (12.6) 0.41 (0.78) 

7.9 (20.5) 0.24 (0.64) 



Table 3 ( continued) Results obtained for the Mnetgen problems. 
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Problem 


^^^pIPM 


/ 


P 


Tp 


5p (5p) 


Ep {Ep) 


PDSl 


29083850483.5 


0.57 


1 


0.7 


1.0 (1.0) 


1.00 (1.00) 








6 


0.5 


1.4 (1.9) 


0.23 (0.31) 








11 


0.5 


1.5 (2.1) 


0.13 (0.18) 


PDSIO 


26726869329.4 


0.79 


1 


46.2 


1.0 (1.0) 


1.00 (1.00) 








6 


19.3 


2.4 (2.9) 


0.40 (0.49) 








11 


16.6 


2.8 (3.6) 


0.25 (0.32) 


PDS20 


23820311896.6 


0.88 


1 


234.4 


1.0 (1.0) 


1.00 (1.00) 








6 


70.1 


3.3 (3.7) 


0.55 (0.62) 








11 


55.4 


4.2 (5.0) 


0.38 (0.45) 


PDS30 


21385482088.8 


0.92 


1 


799.1 


1.0 (1.0) 


1.00 (1.00) 








6 


193.7 


4.1 (4.3) 


0.68 (0.72) 








11 


141.9 


5.6 (6.2) 


0.51 (0.56) 


PDS40 


18852465159.0 


0.92 


1 


1017.7 


1.0 (1.0) 


1.00 (1.00) 








6 


213.4 


4.8 (4.3) 


0.79 (0.72) 








11 


141.3 


7.2 (6.2) 


0.65 (0.56) 


PDS50 


16601676244.4 


0.93 


1 


2003.1 


1.0 (1.0) 


1.00 (1.00) 








6 


390.4 


5.1 (4.5) 


0.85 (0.74) 








11 


254.1 


7.9 (6.6) 


0.71 (0.59) 


PDS60 


14265869776.2 


0.96 


1 


4924.1 


1.0 (1.0) 


1.00 (1.00) 








6 


917.5 


5.4 (5.1) 


0.89 (0.85) 








11 


551.5 


8.9 (8.2) 


0.81 (0.74) 


PDS70 


12240890481.6 


0.97 


1 


7055.5 


1.0 (1.0) 


1.00 (1.00) 








6 


1574.4 


4.5 (5.2) 


0.74 (0.87) 








11 


881.5 


8.0 (8.5) 


0.72 (0.76) 


PDS80 


11468486724.2 


0.97 


1 


7737.3 


1.0 (1.0) 


1.00 (1.00) 








6 


1628.6 


4.8 (5.2) 


0.79 (0.86) 








11 


928.4 


8.3 (8.3) 


0.75 (0.75) 


PDS90 


11087270971.3 


0.97 


1 


12059.8 


1.0 (1.0) 


1.00 (1.00) 








6 


2293.7 


5.3 (5.3) 


0.87 (0.87) 








11 


1355.9 


8.9 (8.6) 


0.80 (0.78) 



Table 4 Results obtained for the PDS problems. 
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Abstract In this paper we introduce the behavioral approach as a mathematical 
language for describing dynamical systems, in particular systems mod- 
eled by high order constant coefficient linear differential equations. We 
investigate what data have to be added in order to express the influence 
of the environment and the initial conditions on the system. We give 
an algorithm to check whether these additional constraints are satis- 
fied by a (unique) trajectory. We define the concepts of observability 
and controllability, and present algorithms which provide a constructive 
verification of such properties. 



1. INTRODUCTION 

The purpose of this paper is two-fold. Firstly, we introduce some of 
the main ideas of behavioral systems theory as an abstract framework 
for modeling and analysis of dynamical systems. Secondly, we show 
how algorithms can be developed which allow the analysis of system 
properties, even at the high level of generality at which we shall be 
working. 

The starting point of our discussion is the definition of a system as 
the set of feasible trajectories of the variables whose dynamics we are 
modeling. Such a definition has two crucial aspects: on the one hand 
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it abandons the idea of a system as an input/output map, i.e. as a 
signal processor, and simply makes it a relation between variables which 
evolve in time, but for which no hierarchical or cause/effect structure 
is given a priori. On the other hand, it distinguishes clearly between 
the system (i.e. the feasible trajectories) and its representations (e.g. 
equations, graphs, grammar rules, etc.). 

Crucial definitions such as controllability and observability are also 
given in a representation-free fashion. 

At the level of representations, we concentrate on linear differential 
systems, i.e. systems whose trajectories can be described as solutions 
to a set of linear differential equations. For such systems we describe 
algorithms which allow one to check controllability and observability, and 
which allow one to verify whether trajectories satisfying given constraints 
can be simulated. 

One crucial issue we shall overlook for lack of space is that of describ- 
ing systems as interconnections of smaller subsystems. Such a concept, 
very typical in engineering thinking, finds in the behavioral framework 
a nice formal description. It also motivates the introduction of the con- 
cept of latent variables, namely variables which we are not interested in 
modeling, but which we have to take into account, in order to describe 
the subsystems and the interconnections that provide the final model. 

The main references to the ideas we will discuss are [7, 9]; in [10], 
control issues are also addressed from this point of view; finally, [8, 6] 
use this same framework to study properties of systems described by 
partial difference and differential equations. 

2. MODELING A DYNAMICAL SYSTEM 

If one had to define what the purpose of modeling a dynamical system 
is, one could reasonably say it is describing how a set of variables of 
interest, call them u;, evolve as a function of time. If we indicate by T 
the time axis of interest (typically T = M for continuous time models 
and T = Z for discrete time ones), and by W the space in which the 
variables of interest take on their values (e.g. W if there are q real 
valued variables), then the tc’s are elements of W^, with denoting 
the set of maps from T to W. The model of the system tells us that only 
a subset of such trajectories can actually happen, namely the subset 
that complies with the laws of the system. We will indicate this set 
of admissible trajectories as 03 and refer to it as the behavior of the 
dynamical system. 
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Formalizing the above discussion, we define a dynamical system as 
a triple § = (W, T, 23), with W the signal space^ T the time axis and 
23 C the behavior of the system. 

The above definition is the cornerstone of behavioral systems theory, 
and in essence it defines a model as an exclusion law, a rule that allows 
us to pick a subset of feasible trajectories out of a set of possible ones. 
Given its crucial importance, we illustrate it with two examples. 

1. Newton’s second law imposes a restriction that relates the position 

g of a point mass to the force F acting on it. This relation is 
F = m-^q, with m the mass. This is a dynamical system with 
T = M, W = X and behavior 23 consisting of all maps 

t G M i-> (g, F){t) G X that satisfy F = m-^q. 

2. Kepler’s laws describe the possible motions of the planets in the 

solar system. They define a dynamical system with T == R, W = 
R^ , and 23 the set of maps w : R that satisfy the following 

laws. The paths w must be ellipses in R^ with the sun (assumed 
in fixed position) at one of the foci, the radius vector from the sun 
to the planet must sweep out equal areas in equal time, and the 
ratio of the period of revolution around the ellipse to the major 
axis must be the same for all w’s in 23. 

Classical notions such as linearity and time-invariance are also intro- 
duced very naturally, starting from the above formal definition of a sys- 
tem. In particular, we talk about a linear system if W is a vector space 
and 23 a linear subspace of and about a time-invariant one (assum- 
ing T = R or Z) if (7^23 = 23 for alH G T, where denotes the t-shift 
defined by := f(t' + 1), G R. 

3. DIFFERENTIAL SYSTEMS 

As discussed in the above section, when it comes to modeling a dy- 
namical system, what we are really after is the behavior 23, the set of 
admissible trajectories. Of course such a set can be described in many 
possible ways, for example through differential equations as in Newton’s 
second law, or through formal descriptions, such as in Kepler’s laws. It 
is therefore conceptually misleading to identify the idea of a system with 
that of a set of differential equations, because, as we have pointed out, 
equations are just one of many possible instruments that can be used to 
specify behaviors. As further examples, think of finite state automata 
whose behavior is typically described graphically, or non-linear electronic 
components that are often described by graphs in the I-V plane. 
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Although identifying systems with equations is not appropriate, the 
class of systems whose behavior is specified by differential equations 
deserves special attention, because it plays such a prominent role in 
physical and engineering applications. We define a differential system as 
a system with T = E, whose behavior 05 consists of all solutions of a set 
of differential equations of the form 

Of even greater interest to us is the subclass of differential systems for 
which W is a finite dimensional vector space, and the defining equations 
are not only linear but also the variable t does not appear explicitly in 
the above equation. In this case we talk about a linear time-invariant 
differential system. If W is g-dimensional, say, the behavior is then 
specified as all solutions to 

d d^ 

R^w + + h = 0, 

with Ri G E^^^, i — 0, 1, . . . ,L, where p denotes the number of rows 
in the above system. Notice how algebraic constraints (i.e. differential 
equations of order 0) are automatically included in this class. To avoid 
technical issues, in the following we regard 05 as the set of solutions 
of the above set of equations, in other words we take 05 C C^(E, E*^). 

To the above system of differential equations, we can associate in a 
natural way the polynomial matrix R{^) — Ro + R\^ -h • • • + Rl^^ G 
]g|-^]px( 7 ^ the elements of the matrix being polynomials in ^ G E. Given 
this association, we often write the set of equations as 

For obvious reasons, we may refer to the above as a kernel representation 
of the behavior of our linear time-invariant differential system, and write 

S=ker(fl(|)). 

If we are given a linear subspace V C E^ , we know from basic linear 
algebra that we can always find a matrix R such that V = ker(i?); 
we also know that such a matrix is not uniquely defined. Something 
very similar happens when looking at kernel representations of linear 
differential behaviors, which are subspaces of £^(E, E^); in order to 
investigate this aspect we first recall the definition of a module over the 
polynomial ring E[^]. 

The module spanned hy a set ... ,Vp G E’^ [^] of polynomial vectors, 
denoted by {vi^ . . . is defined as the set of all linear combinations 
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with polynomial coefficients of the given vectors. Thus we have 

(t;i, . . . , Wp) = I ^ hiv^ : hi G M[^] | C K ” [(]. 

Further, itself is a module, trivially obtained as (ei, . . . , e^) with 

6i the 2 -th unit vector. Modules of the form (ui,..., 2 ;p) C W[^] are 
submodules of 

The set of generators of such a submodule is not unique. In other 
words there can exist elements ui, . . . , G [^] such that (t^i, . . . , Vp) 
— (ui, . . . ,Ur). Because in general p ^ r, the cardinality of the gen- 
erating set of a given module is also not uniquely determined. The 
minimal cardinality is unique, however; in other words, for any submod- 
ule DJI C there is a greatest integer c such that any generating set 

of the given submodule must contain at least c elements. Generating 
sets with exactly c elements are called minimal generating sets for DJI. 

Given a polynomial matrix R G we will now indicate by {R) 

the submodule spanned by its rows. It turns out that, if R' is also a 
polynomial matrix with q columns, then 

In other words, any behavior will admit many different kernel represen- 
tations, but is associated with one and only one submodule of 
namely the module generated by the rows of one, and therefore all, of 
its possible kernel representations. 

This non-uniqueness in the representation of a behavior is a conse- 
quence of our definition of a system as a set of trajectories, rather than 
as a set of equations. It also ha>s practical relevance, because it en- 
ables us to use in each situation the representation which we find most 
appropriate for the purpose at hand. 

The non-uniqueness in the cardinality of generating sets for modules 
implies not only that several matrices R satisfy 05 = ker(i?(^)), but 
also that the row dimension of R is not uniquely defined (i.e. the number 
of equations we need to specify a behavior is not unique). The above 
discussion, however, provides a minimal number of rows, say c, that such 
a matrix must contain. Any R G such that 05 — ker(i? (J^)) will 

be called a minimal representation of 05. The minimal representations 
correspond to polynomial matrices R which are of full row rank over the 
ring M[^] (that is, R G has a nonsingular c x c submatrix). 
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4 . MODELING THE INFLUENCE OF THE 
ENVIRONMENT 

When modeling systems, we will usually be dealing with ‘^open” sys- 
tems. This means that the systems interact with the environment around 
them, so the given model includes some freedom in the re’s for the influ- 
ence of the environment. From the mathematical point of view, this will 
show up in the fact that, if 05 = ker(i? (^)), and if i? E is a full 

row rank (equivalently, a minimal) representation of 05, then p < q. In 
other words the system is underdetermined, having more variables than 
equations. 

A special case of the situation described above is obtained by looking 
at systems in the traditional input-output form, which are described by 
the equations 

with P square, det(P) ^ 0, and P~^Q a matrix of proper rational func- 
tions. Such systems correspond in the notation (1) to i? = [P — Q] 

and ^ ^ ^ . It can be shown that the u^s are free in (2), meaning 

that, for any given u, there exists a y that satisfles the equations (2). In 
addition, the y’s are bounded, meaning that they are uniquely specifled 
by u and by the initial values y(0), ^y(O), .... 



5. SIMULATING TRAJECTORIES 



As discussed above we are dealing mainly with underdetermined sys- 
tems of equations. Suppose now that we are interested in simulating a 
possible system trajectory, in other words in reproducing one solution 
of the system of equations (1). It is necessary to deal with the underde- 
termination of the original system. A very general way of doing so is by 
specifying an additional set of equations of the form 



'^Ur='' 



with / E £^(R, M-^). Typically, the choice of C/, V and / corresponds 
to flxing the external influences that act on the system. As a special 
case of additional equations (3), we may choose ?7 = [0 /] and V = I. 

It follows from ^ f ^ (^) f^^kes up the freedom in u by the 



classical assignment u = f. 

Another possibility is that one might want or have to impose a set of 
conditions that the system variables and their derivatives should satisfy 
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at a given instant in time, say t — 0. The conditions may take the form 



where a is a given real vector of suitable dimension. 

Thus the problem becomes one of first investigating existence and 
uniqueness of a solution which satisfies both the system equations (1) 
and the extra constraints (3) and (4), and then providing an algorithm 
to compute a (unique) solution. In the next pages we shall address the 
first issue in detail and overlook the second one. 

Notice that, by setting K = ^ and J — ^ ^ ^ , equations 

(1) and (3) can be written together as K w = J {^) f- Slightly 
generalizing the situation described above, we will therefore look at the 
following problem. We are given polynomial matrices K G S E 

J E a real vector a E and a function vector / E 

C^(R, R’^). These matrices and vectors provide the following system of 
differential equations with initial conditions: 






5 - MO) = a 



The question is to determine conditions under which there exists a so- 
lution w E £^(R, R^) of these equations. The question is answered by 
the following theorem, already presented in [2]. 



Theorem 1 : Let K e R[^F^^ 5 E , J E a E R"^^ 

and f E £^(R, R’^) be given. The system of differential equations with 
initial conditions (5) has a solution w E £°^(R, R^) if and only if the 
data have the two properties 



1 : n E and nK = 0 



6eR^"<p[(^] andeS = bK 



/ = 0 , 



/ ( 0 ). 



The first of these conditions states that, in order to have a solution w 
to K w = J f for di given /, any differential relationships which 
hold for the rows of K must also hold for the corresponding components 
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of the vector J (^) /. Moreover, the other condition states that, if a 
solution satisfies 5 (^) ic(0) = a, then, whenever a linear combination of 
the left hand side of the initial conditions can be written as a consequence 
of the left hand side of the equations, then the right hand side of the 
initial conditions must be in the same way a consequence of the right 
hand side of the equations. The first condition, therefore, expresses 
consistency of the set of equations K w — J f , while the second 
one expresses consistency of the initial conditions with respect to the 
given equations. Both conditions can be seen as generalizing the well 
known rank condition for solvability of systems of algebraic equations 
Ax = y, say. 

A MATLAB pseudocode will be given at the end of this section that 
sketches an algorithm that checks these conditions. It requires the fol- 
lowing useful concepts. 

■ For any K G the annihilators for the rows of K are defined 

to be the elements of the set 



= {nGM^^^[^] : nif = 0} . 



Such a set is a submodule of W[^]] in algebraic literature it is 
called the syzygy module of the rows of K (see [4, 1]). One can 
always find a polynomial matrix N such that its rows generate 
S)k, in other words such that S)k = (N). In [1] an algorithm is 
presented which constructs a suitable N from JT; such an algorithm 
is part of most computer algebra packages; in our pseudocode we 
assume that a procedure SYZYGY is available which performs 
this computation. Condition 1 of the above theorem can then be 
checked by verifying = 

■ Given K G we define its highest row coefficient matrix 

Khc to be the real matrix whose i-th row contains the coefficients 
of the highest power of ^ that occurs in the i-th row of K, For 
example. 






3^2 + ^ + 1 2 

2e e + 1 



Khc = 



3 0 
2 1 



The matrix K is defined to be row proper if the rows of K^c are 
linearly independent. If K is not row proper, its row proper form 
is defined to be any matrix JT' such that the modules (K) and {K') 
are the same, and such that K' is row proper. Of course K' is not 
uniquely defined, but any row proper form can be obtained from 
another by taking linear combinations of the rows. For example. 
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e c + i 



if K = 




e + 1 -e 

e ^ + 1 



, then 
and K '2 = 



2^ + 1 1 
^ ^ + 1 



are both row proper forms of K, Because (K) = {K')^ it follows 
that there exists a polynomial matrix U such that K' = UK. 
Our pseudocode uses a procedure [K' ^ U] =ROWPROP(K) that 
returns a row proper form of K and the transformation matrix U. 
Classical algorithms for doing so exist in the literature (e.g. in [5]). 



The leading monomial matrix Kim of E is obtained by 

taking for each row the leftmost occurence of the highest power of 
^ appearing in the given row. For example, 



K = 



^2 + ^ + 1 2 3^2 

4 2^ + 1 e 



^Im 



' 0 0 ■ 
0 2 ^ 0 



A matrix P with q columns is said to be reduced with respect to 
K if there exists no polynomial matrix V such that Pim — VKim- 
For example, the vector P ~ [(^ 0 0] is reduced with respect to the 
above K while P = [0 0] is not. 

Given any two polynomial matrices K and S with the same number 
of columns, a division algorithm can be designed that generates a 
quotient matrix B and a remainder matrix P such that 5 = P + 
BK^ where P is reduced with respect to K. Our pseudocode uses 
a procedure [P,P] ==DIVIDE(5, AT), which returns the quotient 
and remainder matrix of the division of S by K. An algorithm 
to perform this kind of division is described in [1], using generic 
orderings of vector polynomials that are more generic than our 
concept of leading monomials. Actually, all we are doing in this 
section could be recast in the more abstract language of canonical 
forms for polynomial matrices as in [2], but we prefer not to do so. 

Let AT' be a row proper form of K and let S be as in the previous 
paragraph. Using the division algorithm we can find a quotient 
matrix B' and a remainder matrix P such that 5 = P + B'K'. 
Because K' — UK for some matrix [/, these remarks imply S = 
P 4- BK with B = B'U. We note, however, that the P and B 
obtained in this way are not necessarily the same as those that 
would occur if we divided S by K. This is a crucial and complex 
issue, very well addressed in [4, 1], and which reasons of space force 
us to overlook. 
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■ As a consequence of having divided by a row proper form of X, it 
can now be shown that the remainder matrix P has the property 
that, given a real vector i G there exists a polynomial vector 

b [<^] such that iS = bK if and only if iP = 0; in this case we 

have b — IB ^ where B is defined above. The set CHp of constant 
annihilators of the rows of P is defined as: 

CHp = {£eR^^^ : iP = 0}^ 

which is a linear subspace of R^ . Therefore standard linear algebra 
techniques allow us to build a matrix L whose rows span CHp. In 
our pseudocode we assume the availability of a procedure ANNI- 
HIL that constructs such a matrix. Thus, remembering b = £B^ 
the checking of condition 2 in the above theorem is reduced to 
a linear algebra problem, namely the computation of L and the 
verification of La = {LB{^)J /)(0). 

With these remarks at hand, we can now sketch the desired algorithm: 

So1v=SOLVABILITY(A:, J, 5, /, a); 

Solv=0; 

7V=SYZYGY(A:) ; 

if N(i)J(i)f==0 th.n 

Solv=l; 

[K', C/]=R0WPR0P(A:); 

[R',P]=DIVIDE(5, K'); 

L=ANNIHIL(P); 

B = BV; 

if La ^ (LR(^) J(^) /)(0) then Solv=0 endif; 
endif . 

The procedure returns Solv=0 if the problem has no solution, Solv=l 

otherwise. 

We close this section by showing how the classical Cauchy problem 

fits into our framework. 

Example 2 : Consider the first order system: 

X = Ax + Bf 
x{0) = a 

which corresponds, in our notation, to K = S = I and J = B. 

Because K is square and non-singular, the set S)k of annihilators of the 
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rows of K contains only the zero vector, so = 0. We see that K 
is row proper and that the division of 5 by gives P = S = I and 
B = 0, so CHp also contains only the zero vector, i.e. L = 0. Therefore, 
by applying Theorem 1, we conclude the existence of a solution x for 
arbitrary / and a. 

6. OBSERVABILITY 

As announced in the introduction, the purpose of the final two sections 
of this paper is recasting the classical concepts of observability and con- 
trollability in the language we have used to describe dynamical systems; 
for the case of linear differential systems, we shall present algorithms 
which allow us to verify their properties in terms of the coefficients of 
the defining equations. 

Traditionally, in the context of input /state/output systems of the form 
■^x = /(x,u), y = h{x^u), observability is defined as the possibility 
of deducing, knowing the laws of the system, the state trajectory x(-) 
from observation of the input and output trajectories u(-) and y(-). In 
our context, however, no special variables such as the state show up 
in the definition of observability, which is now seen as the possibility of 
deducing the trajectories of a subset of the system variables u;, given the 
laws of the system and observations of the remaining variables, which 
form a second subset. The first subset is often referred to as to-be- 
observed variables and denoted by tC 2 , while the latter is often referred 
to as observed variables and denoted by wi. 

Formally, let (T, Wi x W2,55) be a dynamical system for which the 
splitting of the signal space W into Wi x W 2 corresponds to the separa- 
tion of variables into observed and to-be-observed variables. We denote 
a typical element of the behavior by w = {wi^W 2 ) G 03, and we define 
W 2 to be observable from wi if 

(u;i,u;2)g0S and G 03 =4- W 2 = 

In the case when 03 is a linear behavior, this is equivalent to the condition 
{0^W2) G 03 =4^ W2 = 0. 

When discussing observability for 03 = ker(i? (^)), it is convenient to 
partition R into two parts that correspond to its action on the w\ and 
W 2 subsets. Therefore we rewrite the behavioral equations -R ( = 0 
as 
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Before giving conditions for observability we recall two definitions. A 
matrix G G is said to be unimodular if it admits a polynomial 

inverse, in other words if there exists a matrix F such that FG = GF = 
/; this is equivalent to det(G) = g ^ M\{0}. A polynomial matrix 
R G is said to be left prime li R — GR\ with G G 

and i?' G implies that G is unimodular; the definition of right 

primeness is analogous. 

We can now state a theorem (see [3]) that presents equivalent condi- 
tions for observability. 

Theorem 3 : Let ® he the set 



® = {«; = 



and assume M G Then the following properties are equivalent: 

1 u >2 is observable from w\, 

2 There exist unimodular matrices U and V such that 



UMV = 



I 

0 ’ 



3 rank(M(A)) = ^ for any A G C, 

4 M is right prime, 

5 The rows of M span the full module 

Condition 5 of Theorem 3 can be restated as follows: consider the 
set of row vectors of degree 0 contained in the module spanned by the 
rows of M; such a set is obviously an E- vector space contained in E^, 
and, in order for M to be right prime, it must be E^ itself. We now 
sketch an algorithm for investigating such a condition. It recursively 
builds a generating set for the above vector space, and checks whether 
the dimension of the space is i. We employ the following notation. 

■ Standard MATLAB notation will be used to indicate the rows and 
columns of a matrix, e.g. M(i, :) is the z-th row of M, and M{i : j, :) 
are rows z to j of M. 

■ denotes the set of rows of the matrix M that have degree 0. 
For example 



+ e + i 2 
1 1 



M = 



= [1 1 ], 
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■ di is the degree of the i-th row of M, so in the above example we 
have di = 2 and d 2 — 0. 



■ Mfic ^ is the highest row coefficient matrix of M as defined 
in the preceding section. Further, G is the highest 

row power matrix of M, meaning that, for i = 1, 2, . . . ,p, the row 
Mhpih'-) contains just the highest power of ^ in M(i,:) with its 
coefficients. Therefore Mhp{i , :) is the product 0? in 

the above example we have 



^hc — 



1 0 
1 1 



and Mfip = 



0 

1 1 



■ M=Order(M) is a procedure that permutes the rows of M into 
decreasing row degree order. For example 





e 1 




■ + e + 1 2 ■ 


M = 


f + ^ + 1 2 


Order(M) = 


^ 1 




1 1 




1 1 



■ Let the p x i matrix M be ordered by Order (M), and then let 
Mhc be formed. For any integer i between 1 and p — 1, standard 
linear algebra allows us to find, if it exists, a non-zero real vector 
n such that 0 — + 1 • P? 0- For example. 



M = 



e 1 

0 1 



Mftc(l,0 =nM/,c(2 : 3,:) 



for n = [1 1]. Given such an n, the function /i=polann(n, M) 
returns a polynomial vector h such that Mfip{i,:) = hMhp{i + 
1 : p,:). In the above case, for example, polann(n, M) returns 
h = [( To build h from n is straightforward, because, for 

k = i + 1, . . . ,p, the element of h that multiplies the A;-th row of 
Mfip is the corresponding element of n times 



■ M=Eliminate(M, i) is a procedure that removes the i-th row of 
a matrix M. For example, if 



M = 



^ + 1 + 2 

e 1 

0 1 



then Eliminate(M, 2) returns 

^2 + ^ + 1 + 2 

0 1 



M = 
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■ Let m be a polynomial row vector of degree dm and length p, and 
let M G be a matrix ordered as by Order; moreover, let j 

be such that the degree of is less than or equal to dm for 

i > j and greater than for i < j — 1. Then [M, j]=Insert(M, m) 
is a procedure that replaces the matrix M by the matrix with the 
p -\- 1 rows (M(l : J — 1, :), m, M{j : p, :)), where, as usual, p is 
the number of rows of the original M. The procedure also returns 
J, which is the row index of m in the new matrix. For example, if 



M - 



+ e + i e^ + 2 
1 1 



and m = [^ + 1 ^] , 



then Insert (M,m) returns 



M = 



+ l <^^ + 2 

e + 1 e 
1 1 



and j = 2. 



We can now sketch (in MATLAB pseudo-code) our procedure for check- 
ing right primeness (RPR) of a matrix M, and hence observability of a 
differential system. 

[M,obs]=RPR(M); 

M=Order(M) ; 
obs=(rank(M'^) == i) ; 
p=rowdim(M) ; 
z=p-rowdim(M®) ; 



while ((not obs) and (i>l)) do 

if (3 real n ^ 0 such that 

Mhc{i, 0 = nMhcii + l -P, ■)) then 

/i=polann(n, M) ; 
m = M{i,:) — hM{i -I- 1 : p, :) ; 
M=Eliminate(M, f) ; 

if (m 7 ^ 0) then 

[M, j]=Insert(M, m) ; 

if (degree(m) == 0) then 
obs=(rank(M°) == O ; 
i = j- l; 
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else i = j endif ; 

else p = p — I i = i — 1 endif ; 

else i = i — 1; 
endif ; 

endwhile . 

In the above algorithm obs is a boolean variable that tells us whether 
the matrix we are considering is right prime or not; we call this variable 
obs because of the relation between right primeness and observability 
explained in Theorem 3. As already discussed, right primeness is checked 
by verifying if a generating set for the vectors of degree 0 contained in the 
module spanned by the rows of M has rank equal to i. After reordering 
M, we immediately perform a check to see whether the vectors of degree 

0 in the original matrix are already sufficient to meet the requirement. 

If this is not the case, after setting i = p — rowdim(M^), which gives the 
index of the first row from the bottom with degree higher than 0, we 
enter the main while loop in which we try to generate additional constant 
vectors by taking combinations of rows of M. This is done by replacing 
a row M(i, :) of degree higher than 0 by a polynomial combination of 
M(z, :) and later rows of M, provided that the combination is of lower 
degree than M(i, :) itself. Such a degree lowering is possible only if the 
highest coefficient vector of M(z, :) is linearly dependent on the highest 
coefficient vectors of the rows of equal or lower degree. This condition 
is tested in the “if” statement by looking for a real vector n 0 of 
suitable dimension such that Mhdh-) = nM^di + 1 • P, 0- If such 
a dependence is found, then starting from n we build the polynomial 
vector h such that Mhp{i ^ :) = hMhp{i + 1 : p, :). Thus the polynomial 
vector m = — hM{i + 1 : p, :) has degree lower than M(i, :), as 

desired. 

If such a degree lowering is found to be possible, we then eliminate 
M(i, :). Further, when the new row vector m is not zero, it is included in 
the matrix M in the earliest position that maintains the degree ordering 
of Order. If the new vector has degree 0, we check again if the right 
primeness condition is fulfilled; alternatively, the next iteration of the 
while loop will check whether the degree of this newly generated vector 
can itself be lowered (this is the reason for having i = j). 

The algorithm ends if the condition for right primeness is verified (obs 
is true) or if no more lowering of degree is possible. When the condition 

1 < I occurs in the while statement, this means that we have considered 
all the rows in M, so there is no possibility of further lowering. Because 
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at each step we are replacing a vector by one of lower degree, or are 
reducing the number of rows of M, the i < I condition will always hold 
eventually. Therefore the stopping rule for the algorithm is well defined. 

We see that the final matrix M is an output argument of our proce- 
dure. This is not needed when the only output of interest is obs; it will, 
however, be handy in the next section, where RPR is a subprocedure 
of a procedure that checks controllability. 

As an example of the above algorithm, we wish to show its application 
to the observability matrix for state space systems. 

Example 4 : We consider the classical problem of deducing the state 

trajectory x(-), starting from observations of the input and output tra- 
jectories u{-) and y(-) in the state space system 

X = Ax + Bu 
y = Cx + Du 

Such a problem corresponds, in our formalism, to 

, W 2 = X and M = 

where M G and I is the i x i identity matrix. 

The algorithm provides an efficient way of checking the classical rank 

C 

CA 

CA^-^ 

that the rows of this matrix are vectors of degree 0 contained in the 
module spanned by the rows of M. Specifically, the rows oi C A can be 
expressed as {^1)0 + C{—^I + A), and are therefore polynomial linear 
combinations of degree 0 of the rows of M. By induction, one finds that 
the rows of CA^ — {^I)CA^~^ -f CA^~^{—^I + A) are also 0 degree 
vectors in the module spanned by the rows of M. Further, the rows 
of the observability matrix are actually a generating set for the space 
of all 0 degree vectors spanned by the rows of M; this remark follows 
easily from the fact that A^ is linearly dependent on /, A, . . . A^~^, so no 
independent row vectors would be added by considering CA^ for k > i. 

Checking that the observability matrix has rank i is therefore equiv- 
alent to verifying that a generating set for the space of vectors of degree 
0 contained in the module spanned by the rows of M has rank L As 
discussed above, this task is done by our algorithm in an efficient way. 
For example, consider the case when C is a non-singular matrix. Then 





+ A 

C J ’ 
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observability is found immediately without computing the rest of the 
observability matrix; in this case, in fact, our algorithm stops without 
even entering the main “while” loop. 



7. CONTROLLABILITY 



In the context of state space systems, the concept of controllability 
addresses the possibility of reaching any final state starting from any 
initial state in finite time. The state system = f{x,u) is said to be 
controllable if, for any initial state xq and any final state xy, there exists 
an input function u and a time T such that the solution to ^x = f{x^u) 
with initial condition a;(0) == xq yields x{T) = Xf. 

As in the case of observability, we now give a definition of controllabil- 
ity which relies only on properties of the system’s trajectories, and not 
on specific properties of special variables chosen to represent it, namely 
state variables in the above example. 

In particular, if ® is a continuous-time, time invariant behavior, we 
define it to be controllable if, for any two trajectories tci, W 2 ^ 55, there 
exists > 0 and a third trajectory it; G 55 such that 



w 




W2(t-ti), 



t < 0 , 

t > h. 



The intuition behind this definition is that, for a behavior to be con- 
trollable, one must be able to connect any admissible “undesired” past 
trajectory to any admissible “desired” future one, through suitable steer- 
ing. 

The concepts of controllability and observability are of central impor- 
tance in systems theory, in particular in controller design and stabiliza- 
tion [10, 5], and in observer design and filtering [5], respectively. Very 
roughly speaking, controllability implies that a system can be stabilized 
by control. More precisely, the system, viewed as a plant, can be in- 
terconnected with a controller such that all solutions that satisfy the 
equations of both the plant and the controller go to zero (at an arbitrar- 
ily fast rate) as time goes to infinity. Observability in turn implies the 
existence of a signal processor that deduces, in a suitable way, the to- 
be-estimated variables (the W 2 ^s in the definition of observability) from 
the observed variables (the wis in the definition of observability). 

Again, as for observability, we now concentrate on the case of lin- 
ear differential systems 55 = ker(i?(^)). We seek conditions that are 
equivalent to controllability in terms of the polynomial matrix R. 

Unfortunately, we have to examine two situations separately: the case 
in which R is of full row rank (equivalently, it provides a minimal kernel 
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representation of 55) and the case in which it is not. Notice that in 
the observability case any attempt to observe ‘‘too many” variables (i.e. 
M does not have full column rank) is excluded by Theorem 3, whereas 
using “too many” equations to describe 55 (i.e. R does not have full row 
rank) may allow controllability of 55, even though, as we shall see, the 
conditions on R become less elegant. 

The above remark may seem strange to anyone used to the old princi- 
ple that controllability and observability are dual concepts; in the setting 
we are working in we abandon this adage. Indeed, our definitions show 
that, while controllability is essentially a property of the behavior, ob- 
servability also depends on the choice of observed and to-be-observed 
variables. The fact that technical conditions for checking observability 
and controllability often turn out to be “dual” (in some sense) should 
not make us forget the fundamental difference just mentioned. 

We now state two theorems (see [3]) which present conditions for 
controllability in terms of the polynomial matrix R. The first result 
applies to the general case. 



Theorem 5 : The following are equivalent: 

1 55 = kev{R (^)) is controllable, 

2 There exist unimodular matrices U and V such that 



URV = 



I 0 
0 0 ’ 



3 rank(i?(A)) is independent of X for any A G C, 

4 If N{^)is a minimal generating set for the module spanned by the 
columns of R, then N{ffj is a right prime matrix. 

In the case when R has full row rank, much more can be said, this time 
yielding a result which can be regarded as the “dual” of Theorem 3. 

Theorem 6 : Let R{ff) G be a full row rank polynomial matrix. 

The following are equivalent: 

f 55 = ker(i? (^)) is controllable, 

2 There exist unimodular matrices U and V such that URV = [/ 0], 

3 rank(i?(A)) = p for any A G C, 

4 R is left prime, 

5 The columns of R span the full module 
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We now sketch an algorithm that assesses controllability for a given 

» = ii))- 

To begin with notice that a matrix is left prime if and only if its 
transpose is right prime. Therefore, given the algorithm of the preceding 
section, it is easy to build a procedure LPR which checks whether a 
matrix R is left prime (equivalently whether ker(i? (^)) is controllable 
under the assumptions of Theorem 6). We can use 

[i?,ctr]=LPR(i?); 

M = R^; 

[M,ctr]=RPR(M); 

R^M^ , 

We know, however, that left primeness is equivalent to controllability 
only when R is a full rank matrix; to check controllability in the general 
case we have to verify the conditions of Theorem 5. In order to sketch 
an algorithm that does so, we make two remarks. 

1 A column proper form of a matrix R is obtained just by transposing 
the row proper form of with row proper defined in Section 5. 
If LPR returns ctr==false, then the matrix R which is returned is 
very close to a column proper form of the original R, Indeed, if left 
primeness is not verified, than the algorithm stops when the highest 
column coefficient vectors of all columns with degree higher than 0 
are linearly independent of the highest column coefficient vectors 
of the subsequent columns. For vectors of degree 0, however, we 
always check their rank but not their independence, so they need 
not be linearly independent. 

The columns with degree higher than 0, therefore, already satisfy 
the property that defines the column proper form of a matrix R, 
so to get a column proper form it is sufficient to replace the degree 
0 columns of the returned R by a basis of the R- vector space they 
generate. The procedure i?— COLPRP(i2) brings the returned R 
into column proper form in this way. 

2 If i? is in column proper form, then the number of its columns 
is equal to the rank of the original matrix i?. Therefore, if this 
number is equal to the row dimension of i?, then R is of full row 
rank, so, by Theorem 6, the test performed by the call of LPR is 
necessary and sufficient for controllability. 

Alternatively, if the rank is smaller than the row dimension of i?, 
we apply Condition 4 of Theorem 5 to check controllability. In this 
case the column proper form is a minimal generating set for the 
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module spanned by the columns of R. Therefore we need to verify 
that the column proper form is right prime in order to conclude 
controllability. 

The above remarks provide the following algorithm for checking control- 
lability. 

[i?,ctr]=CTRB(i?); 

[i?,ctr]=LPR(i?); 
if (not ctr) then 

R=COLPRP(R); 
if (rowdim(i?)>coldiin(i?) ) then 

[R,ctr]=RPR(R); 

endif ; 

endif . 

As for observability, it can be shown that our algorithm corresponds 
to the usual test on the controllability matrix, in the case of check- 
ing controllability of a state space system, where R has the form R = 
({/ — A — B). We are going to show how it applies a known test for 
controllability of systems described by a single differential equation. 

Example 7 : Assume i? = (ri r2), where ri and V2 are polyno- 
mials and di = degree (ri) > d2 = degree (r2). By applying a di- 

vision algorithm for polynomials, we can write r\ = q2T2 + ^3 with 
d2 = degree (r2) > d^ = degree (ra). Going through our algorithm we 
see that, after at most — ^2 + 1 steps, it will yield R = {v2 rs). Simi- 
larly, we write V2 = qsr^ + T4, and after at most d2 — ds I more steps 

it will provide R = {rs r 4). Thus our algorithm applies the classical 

Euclidean algorithm for computing the greatest common divisor of two 
polynomials. Therefore at the end of the calculation we will have R = r 
with r =GCD(ri,r2), so the condition for controllability is equivalent to 
asking if r is a constant, which is equivalent to ri and T2 being coprime 
polynomials. 

8. CONCLUSIONS 

In this paper we have discussed some basic concepts of the behav- 
ioral approach to dynamical modelling. We have treated in particular 
the simulation question, i.e. the problem of adding the specification of 
externally given signals and internally given initial data, so that the sys- 
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tern has a (unique) solution. Further, we elaborated on the notions of 
controllability and observability in this setting. 

Our presentation has concentrated on lumped linear differential sys- 
tems. Present work aims at extending these ideas to distributed and 
nonlinear systems, and to extending the range of applications, particu- 
larly in the area of T-Loo control and filtering. 
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Abstract We present an overview to Lipschitz-type properties of mappings as- 
sociated with solutions of optimization problems including variational 
inequalities and mathematical programs. We show that these properties 
are inherited in various ways by the mapping acting from parameters 
of the problem and the starting point to the set of sequences gener- 
ated by Newton’s method. Some new insights into convergence of New- 
ton’s/SQP method are also presented. 

Keywords: Newton’s method, stability, variational inequalities, optimization. 

1. INTRODUCTION 

In this paper we study continuity properties of solutions to variational 
problems and associated properties of sequences generated by Newton’s 
method. Our basic model is the following inclusion (generalized equa- 
tion) with a parameter 



vef{x)+F{x), (1) 

where v € R'”, x € R” f is a function from R" to R"’^ and F is a 
set- valued map from to the subsets of with closed graph. For 
F{x) = {0} we obtain a system of equations, for F{x) = R!p, the positive 
orthant in R’^, we have a system of inequalities. If F{x) = Nc{x)^ the 
normal cone mapping to a closed set C in R^, for m = then we 
have a variational inequality; in particular, for C — R!f. we obtain a 
complementarity problem. 
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The parameter v provides a shift perturbation to the inclusion (1). 
Most of the results presented in this paper can be extended to problems 
with a basic perturbation parameter, added to the function / and to 
the mapping F, thus regarding (1) as embedded in a larger family of 
inclusions, 

V e f{x,u) + F{x,u). 

The pair {v^u) now represents canonical perturbations. In general, shift 
stability of (1) with respect to v is different from full (canonical) stability 
with respect to {u^ v); in some cases these concepts are equivalent under 
additional conditions for the dependence of / and F on the “generic” 
parameter u. In this paper we stay within the format of shift stability 
of the model (1). We study local Lipschitz-type properties of the map 
“parameter v set of solutions of (1)”, denoted S; that is, 

V S(i;) == {x G I r? E f{x) + F(x)}, (2) 

around a fixed reference point (r’*, x*) in the graph of S. Of course, S is 
the inverse map (/ + F)~h 

For illustration of our results we consider throughout the following 
optimization problem: 

min [g{x) — (x,x)l subject to x G C, (3) 

X 

where ^ is a function from to R, (7 is a convex polyhedral set 
in R’^ and x is a parameter which provides “tilting” perturbations to 
the cost, in the terminology of [17]. The first-order necessary optimality 
condition for (3) has the form 

V eVg{x) + Nc{x), (4) 

where is the derivative of g and Nc{x) denotes the normal cone to 
the set C at the point x. The variational inequality (4) fits into the 
general format of (I). 

By Newton’s method applied to the basic model (I) we mean the 
procedure which generates a sequence {xi,X 2 , • • •} of points x^, with a 
given xo, according to the rule 

^ ^ f{Xn) + V/(Xn)(Xn+l - Xn) + F(Xn+l) fom = 0, I, • • • . (5) 

This procedure is the standard Newton for equations when F = {0}; it 
is usually called a generalized Newton method or a Newton-type method 
for solving systems of inequalities when F — R!p or variational inequal- 
ities when F is a normal cone mapping, respectively. One may expect 
that the “true” Newton method for the general model (I) should involve 
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some kind of linearization of the map F, and a number of authors have 
followed this line of research for various nonsmooth, semismooth, etc. 
equations, see, e.g., the recent survey in [26]. Here we keep the map F 
unchanged with the mental picture that F has a “simple” yet set-valued 
form, e.g. it is a polyhedral map or the normal map to a polyhedral 
cone, both having already a linear structure. Such a model leads us 
quite far, also for infinite-dimensional (i.e., optimal control) problems 
where nonsmooth-analytic tools may become more technical than in- 
strumental. Most of the results in this paper hold in abstract spaces, by 
changing only the terminology. 

Consider the optimization problem (3). Representing the convex poly- 
hedral set C as (7 = {x G | Bx < b} for some matrix B from 
to R"^ and a vector b G R^, the normal cone to the set C at x has the 
form 

Nc{x) — {v e. R^ I V = B^y for some y G and x J_ i;}. 

Then the Newton method (5) applied to the variational inequality (4) 
becomes 



-V + Vg{Xn) + V^5f(Xn)(Xn+i - Xn) + S^yn +1 = 0, 

Bxn+l < b, Vn+l > 0, {BXn+1 ~ &)^yn+l = 0. (6) 

This is the best-known form of the sequential quadratic programming 
(SQP) method for the problem (3); indeed, an iteration in (6) is obtained 
to solving a quadratic programming problem. 

For given x and z;, let ^ = {^ 1 ,^ 2 , • • • , x^, • * *} be a Newton sequence, 
that is, a sequence starting from x and satisfying (5). Denote by N(x, v) 
the set of all Newton sequences, starting from the point x for v. Let 
= {x*, X*, • • • , X*, • • •}, that is, is the constant sequence with all 
elements x*. Note that G N(x*,z;*). We equip the set of Newton 
sequences with a distance induced by the norm: 

IICIloo = sup||x„||. 

n>l 

In this paper we study continuity properties of the map (x, v) ^ N(x, v) 
in relation to corresponding properties of the map S defined in (2). We 
consider three continuity properties which play central roles in quantita- 
tive stability analysis of variational problems: the local upper-Lipschitz 
property, the Aubin continuity, and the Lipschitzian localization prop- 
erty. We also provide some new insights into convergence properties of 
Newton’s method. 

In this paper we apply the basic principle of smooth nonlinear analysis, 
stated by A. Ioffe [10] in the following way: a property of the derivative 
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of a nonlinear map at a given point should hold, in a sufficiently small 
neighborhood, for the map itself. In the context of the inclusion (1) and 
the corresponding solution map S = (/ + this principle means 

that adding terms of order o{x) to (1) doesn’t change certain properties 
of the map S. Since replacing / by its linearization at x* corresponds 
to adding o{x — x*), we call this specific form of the basic principle 
invariance under linearization. 

The basic principle of smooth nonlinear analysis is present already in 
the classical implicit function theorem and is even more explicit in the 
Lyusternik and Graves theorems. In his seminal paper [21], Robinson 
extended this principle to variational inequalities and optimization prob- 
lems. The main idea in the present paper is to apply the basic principle 
to Lipschitz-type properties of Newton’s map N defined above. 

In Section 2 we study the local upper-Lipschitz continuity, a relatively 
recently introduced concept which turns out to be quite natural in non- 
linear optimization. We present characterizations of the upper-Lipschitz 
continuity for the map S and for the Argmin map of the problem (3). In 
particular, we show that the local upper-Lipschitz continuity is invariant 
under linearization. We prove that the Argmin map of the problem (3) 
possesses this property if and only if the standard second-order sufficient 
optimality condition holds. 

In Section 3 we first show that the local upper-Lipschitz continuity of 
the map S implies that every Newton sequence within a sufficiently small 
ball around the solution is quadratically convergent. Then we study the 
local upper-Lipschitz continuity of the Newton map N. We prove that if 
the map S is locally upper-Lipschitz, then the map N is locally upper- 
Lipschitz as well (in the norm). We also show that the converse 
implication holds if, in addition, N is locally nonempty- valued. For the 
optimization problem (3) this result evolves into the equivalence between 
the second-order sufficient optimality condition and the property that 
the Newton (SQP) map is locally upper-Lipschitz and nonempty- valued. 

At the end of Section 3 we briefly show that analogous results can be 
obtained for the proximal point method. 

In Section 4 we discuss the Aubin continuity by following the pattern 
of the previous sections. We give a version of the Lyusternik-Graves 
theorem (Lemma 4.1) and use it to show that the Aubin continuity of 
the map S implies existence of locally quadratically convergent New- 
ton sequences around the reference point. Further, the Newton map 
N is Aubin continuous if the map S is Aubin continuous. As a partial 
converse, the Aubin continuity of the submap of N associated with all 
convergent Newton sequences implies the Aubin continuity of the map 
S. 
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Section 5 is devoted to the Lipschitzian localization, a continuity prop- 
erty which appears in implicit function theorems and is best studied in 
optimization. We show that the Lipschitzian localization of the map 
S implies the existence of a unique Newton sequence around the refer- 
ence point. Also, the Newton map N has a Lipschitzian localization if 
and only if the map S possesses this property. For the problem (3) the 
Lipschitzian localization of the Argmin map is equivalent to the strong 
second-order sufficient optimality condition. 

There is a vast literature on the continuity concepts discussed in the 
present paper and their applications to various variational problems. 
For recent discussions and references see the bibliographical comments 
in [10, 12, 25]. 

As a further application of the approach presented in this paper, we 
target error analysis of discrete approximations to infinite-dimensional 
variational problems for which the parameter v represents the discretiza- 
tion remainder while = r?* is identified with the original (continuous) 
problem. We shall not discuss here discrete approximations; for first 
results in this direction, see the forthcoming paper [6]. 

Throughout we denote by || • || any norm in R’^, by Br{x) the closed 
ball with center x and radius r, and by B the ball Bi(0). In writing ^‘f 
maps X into T” we mean that the domain of / is a (possibly proper) 
subset of X; thus, a set- valued map F from X to the subsets of Y may 
have empty values for some points of X. Given a map F from X to the 
subsets of T, we define graph F = {{x,y) E X x Y \ y e F(x)} and 
= {x £ X \ y e F{x)}. We denote by dist(o:,A) the distance 
from a point x to a set A and by ^ a transposition of a matrix or a 
vector. Throughout L is a Lipschitz constant of V/(-) in a (sufficiently 
large) ball centered at x*. 

2. LOCAL UPPER-LIPSCHITZ CONTINUITY 

The local upper-Lipschitz continuity is a localized version, for the 
graph of a map, of the upper-Lipschitz continuity introduced by Robin- 
son [20]. 

Definition 2,1 A map F : R’^ R"^ is locally upper-Lipschitz contin- 

uous at (y*,x*) G graph F with constants a and b for neighborhoods and 
c for growth if 

T{y) n Ba{x*) C {x*} + c\\y - y*\\B for all y 6 Bb{y*). 

The following properties can be deduced directly from the definition: 
A locally upper-Lipschitz map F at (x*, y*) has necessarily x* as a locally 
unique (isolated) point of F(y*). If a map is locally upper-Lipschitz with 
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constants a, b and c, then for every 0 < a' < a, 0 < 6' < 6, and c' > c, 
it is locally upper-Lipschitz with constants a\b' and c'. If a map T is 
locally upper-Lipschitz at and S is a local selection of F around 

that is, for some neighborhoods U of x* and V of y* we have 
S(y) n ?7 C F(y) n [/ for y G F, then S is locally upper-Lipschitz at 

Robinson [22] proved that in finite dimensions every map whose graph 
is a polyhedral (possibly nonconvex) set is upper-Lipschitz at every point 
of its domain. As a consequence, every map F : -> R^, for which x* 

is an isolated point of F(y*) and graph F is a polyhedral set, is locally 
upper-Lipschitz at (y*,j:*). Such maps are for instance the solution 
maps of the linear variational inequalities over convex polyhedral sets 
when the reference solution is isolated. We note that Robinson used in 
[20] a different definition of the local upper-Lipschitz continuity where 
the localization is in the domain of the map. 

Rockafellar [23], see also [11, 15], gave a characterization of the upper- 
Lipschitz property by employing the so-called proto-derivatives (contin- 
gent derivatives). We showed in [4] that the local upper-Lipschitz con- 
tinuity is invariant under linearization in a very abstract setting (see 
Lemma 2.1 and Corollary 2.1 below). The upper-Lipschitz property was 
studied in [8] for a general nonlinear programming problem with canon- 
ical parameters in both the functional and the constraints. In [8], Theo- 
rem 2.6, it was proved that the Karush-Kuhn- Tucker map has this prop- 
erty with the reference point being a local optimal solution exactly when 
the combination of the strict Mangasarian-Fromowitz condition and the 
standard second-order sufficient optimality condition holds. Further ex- 
tensions and generalizations of this result for nonsmooth mathematical 
programs, as well as extended surveys on the subject, are given in the 
recent papers [9, 12, 16]. 

In a related vain, Zolezzi introduced and studied in [27] a condition 
number for the solutions to abstract optimization problems of the form 
min/(x,y), x G A", where y is a parameter. This conditional number is 
precisely the growth constants c in the definition of the upper-Lipschitz 
property, where F(y) is the set of optimal solutions for y. 

Note that the local upper-Lipschitz property does not guarantee non- 
emptiness of the values of F near the reference point y* (as an example, 
take F(y*) == x* and F(y) = 0 for y 7^ y*). Such a requirement leads to 
a different concept, namely: 

Definition 2.2 A map F : R’^ — > R’^ is locally nonempty-valued at 
(y*,x*) G graph F if there exist positive numbers a and b such that 

F(y)nRa(x*)7^0 for all y e 
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A map r is locally nonempty- valued at {y*^x*) if and only if the 
map r~^ is open at (x*,y*). If r(j/) is the set of solutions of a system 
of inequalities with data y (matrices, vectors), then the radius of the 
neighborhood 6 is a kind of bound on perturbations of the data from, y* 
such that the system still has a solution x within distance a from the 
reference solution x* for y*. With a = oo and after normalization, the 
supremum of such b is exactly the relative condition number introduced 
by Renegar [19]. 

The following lemma shows the invariance under linearization of the 
upper-Lipschitz continuity in a transparent form which is particularly 
suitable for applications: 

Lemma 2.1 Let G be a map from to the subsets of and let 
G~^ be locally upper-Lipschitz at {v*^x*) with constants a and b for 
neighborhoods and c for growth. Let the nonnegative constants a, (3 and 
A satisfy 

a < a, Ac < 1, /? + Aa < 5. (7) 

Let h : R’^ -> R’^ be a function which satisfies 

\\h{x) — h{x*)\\ < A||x — x*\\ for all x G Ba{x*). (8) 

Then the map {h + G)~^ is locally upper-Lipschitz at {v* + h{x*)^x*) 
with constants a and j3 for neighborhoods and c/(l — Ac) for growth. 

Proof. Choose a, (3 and A as in (7) and let h satisfy (8). Put 
y* z=z V* /i(o;*) and let y G S^(y*) and x E {h + G)~^{y) fl Ra(^‘*)- 
Then x G G~^{y — h[x)) fl Ba{x*) and 

||y - h[x) - v*\\ < \\y - y*\\ + ||/i(x) - h{x*)\\ < fi + Xa<b. 

Hence, from the upper-Lipschitz continuity of G~^ at {v*^x*) we obtain 

\\x — x*\\ < c\\y — h{x) — t^*|| < c\\y — y*\\ -h c\\h{x) — h{x*)\\ 

< c\\y-y*\\ +cX\\x-x*\\. 

Thus, 

and the proof is complete. □ 



It is clear from the proof that this lemma still holds if we replace R^ 
with any metric space and R"^ with a linear space with a shift- invariant 
metric. 
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Recall that S{v) denotes the set of solution to (1) for v. With the 
solution map S we associate the map 

V ^ L(?;) = if{x*) + Vf{x*){- - X*) + F{-))-\v) 

obtained by the linearization of / at the point x*. We have x* G L(^;*) 
iff X* G S(u*). From Lemma 2.1 we obtain the following result for the 
solutions of (1) first stated in [4], Theorem 3.2. 

Corollary 2.1 The following are equivalent: 

(i) The map L is locally upper- Lips chit z at (t;*,x*). 

(a) The map S is locally upper- Lips chit z at (t’*,x*). 

We note that the local nonempty- valuedness of a map (Definition 2.2) 
is not invariant under linearization. 

In our analysis of Newton’s method we use the following map: 

(v,x) i-^p{v,x) := {f{x) + Vf(x){--x) + F{-))~'^{v). (9) 

If X* is a solution to (1) for u*, then x* G P(f*,x*). Also, L(^;) = 
P(t;,x*). The following corollary complements Corollary 2.1. 

Corollary 2.2 The following are equivalent: 

(i) The map L is locally upper- Lips chit z at (i;*,x*). 

(ii) The map P is locally upper- Lips chit z at ((i;*, x*), x*). 

Proof. The implication (ii) (i) is immediate by noting that L('u) = 
P(u,x*). Let us prove (i) (ii). Let L be locally upper-Lipschitz at 

(u*,x*) with constants a, b and c and let the positive numbers a and /3 
satisfy the inequalities (7) with A = Lj3. (Recall that, here and later, L 
is a Lipschitz constant of V/(-) in a (sufficiently large) ball centered at 
X*; in this case this is the ball with radius a.) 

Let {v^x) G S^((u*,x*)) and let 2 : G P(u,x) D Ba{x*). We apply 
Lemma 2.1 with 



G(z) = fix*) + Vf{x*)iz - X*) + F{z) 



and 



= fix) + Vf{x){z -x) - f{x*) - Vf{x*){z - X*). 



For any z' E Ba(x*) we have 

= MVf(x)-vf(x*))(z'-x*)ii 
< L(3\\z'-x*\\ = A;|^'-a:*||, 
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hence (8) holds. By assumption, G~^ is locally upper-Lipschitz at 
hence from Lemma 2.1 we obtain that the map {h -|- G)~^ is 
locally upper-Lipschitz at (i;* + /i(:r*), x*). Note that >2: E {h + G)~^{v)n 
Ba{x*). Then, for 7 = c/(l — Ac), we have 

Ik - ^*11 < 7lk - - f{x) - V/(x)(x* - x) + /(x*)|| 

< 'y\\v-v*\\ + ^jL(3\\x-x*\\. 

Thus the map P is locally upper-Lipschitz with constants a and j3 for 
the radia of the neighborhoods and growth constants 7 for v and 7L/?/2 
for X, respectively. □ 



Remark 2.1 Note that the growth constant o/P with respect to x can 
he made arbitrarily small, by choosing /3 small. 

Consider the optimization problem (3) and the corresponding first- 
order necessary optimality condition (4). Let x* be a solution of (4) for 
V = V*. Denote by K the critical cone at (i;*,x*), that is, 

K = {xe Tc(x*) I - Vg{x*) J_ x}, 

where Tc{x*) is the tangent cone to C at the point x*. Then the (stan- 
dard) second-order sufficient optimality condition at (u*,x*) has the 
form 

(n, V^^(x*)u) > 0 for all nonzero u E K. (10) 

Let Argmin(i;) be the set of local solutions for v to the optimization 
problem (3). 

Theorem 2.1 The following are equivalent: 

(i) The second-order sufficient optimality condition (10) holds at 
(^;*,x*). 

(ii) The Argmin map is locally nonempty-valued and upper-Lipschitz 
continuous at (u*,x*). 

Proof. Let X be the map of the critical points, 

x{v) - {x G I ^ G Vg{x) + A^c(a:)}. (11) 

The solution map of the linearization of (4) at x* is defined as 
V C(v) = {x G R^ I V G V^(x*) + V^^(x*)(x — x*) + IV'c'(^)}- (12) 
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Let (i) hold. For short, denote a = Vg{x*) and A = V‘^g{x*). First, let 
us show that x* is an isolated solution of the linear variational inequality 

V* ^ a + A{x — X*) + Nc{x). (13) 

On the contrary, suppose that there exists a sequence Xn -4 x* such that 

V* € a + A{xn — X*) + Nc{xn) for all n. 

Then 

V* Ea + V^g{xn){xn - X*) + o(||a;„ - a;||) + Nc(x* + (rr„ - x*)). 

By Reduction Lemma in [7], 

0 € V^g(Xn)(x„ - X*) + o{\\Xn - x*||) + NK{Xn ~ X*); 



that is, 

{V‘^g{Xn){Xn - X*),Xn ~ X*) + o{\\Xn ~ X*||^) = 0. (14) 

Since ^ 

— — 7 T S K for all n and ||6„|| = 1, 

\\Xn-X*\\ 

we obtain that a subsequence of bn is convergent to, say, u E K. From 
(14) we get 

{V^giXn)bn,bn) + Oi\\Xn-X*\\)=0. 

Passing to the limit with (the subsequence of) n — > oo in the latter 
equality we come to a contradiction with (i). Thus x* is an isolated 
solution of (13). 

As noted at the beginning of this section, since £ is a polyhedral 
map, from a result of Robinson [22] we obtain that the requirement x* 
be an isolated point of C{v*) is equivalent to the local upper-Lipschitz 
continuity of jC at {v*^x*). Then the local upper-Lipschitz continuity of 
the critical point map X at (i?*,x*) follows from the invariance under 
linearization (Corollary 2.1). 

Consider the problem (3) with the additional constraint ||x — rr*|| < 5, 
for ^ > 0. By the Berge theorem, the solution map Argmin^ of the 
new problem is upper semicontinuous dX v = v* . Clearly, Argmin^ is 
nonempty- valued and Argmin^j(t;*) = for 5 sufficiently small. One 
then sees that for v close to 'L’* eventually the constraint ||a: — x*\\ < 5 is 
not active so that Argminj(i;) = Argmin(u) fl Bs{x*). Thus, Argmin is 
locally nonempty- valued at (?;*,x*). Since Argmin(i;) C we obtain 

that (ii) holds. 
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Conversely, let Argmin be locally nonempty-valued and upper-Lip- 
schitz continuous at {v* ^x*) with a growth constant n and neighborhoods 
U and V , Without loss of generality, assume v* = 0. Take any x G 
close a;* such that v = {x — o:*)/2k G V . Let Xy G Argmin(t;) H C/ be an 
associated local minimizer, then \\xy < AcHt’ll- From the optimality 

we have 

g{x) - {v,x) > g{xv) - {v,Xy), 

that is 

g{x) - ^{x -x*,x- X*) > g{xv) - ^{x - x*,Xy - x*). 

Since g{xy) > g{x*) (remember that v* = 0), we continue the chain of 
inequalities obtaining 

g{x) - ^{x - X* ,x - X*) > 

> 



Thus 

for every x close to x*. We obtain the so-called growth condition of 
order two which, as well known, see e.g. [25], p. 606, is equivalent to 
the second-order sufficient condition (10). □ 



g{x*) - — ||a; - x*||||a;„ - a:*|| 
9{x*) - ^||a:-x*||K||u|| 
g{x*) - f-\\x-x*\\'^. 



The above result clarifies the meaning of the second-order sufficient 
condition: it gives not only optimality at the reference point but also 
a certain continuity property of the minimizers with respect to certain 
perturbations and this property is exactly captured by the local upper- 
Lipschitz continuity. 

The approach presented in this section can be applied to other map- 
pings in variational analysis. As an example consider the map 

(u,u) !-)■ M^{u,v) = (/(•) +/i(- -u) + F{-)y^{v), (15) 

where /i is a positive scalar. In the context of our illustrative problem 
(3), the map is the stationary point map of the prox- regularization 
(Moreau- Yosida regularization) of (3): 

g{x) - {v,x) + ^H\\x - uf . 



min 

xec 
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A straightforward application of Lemma 2.1 gives us 

Corollary 2.3 Let the map S be locally upper- Lips chit z at (u*,x*) with 
a growth constant c. Then for every positive fi < 1/c the map is 
locally upper-Lipschitz at ((?;*, x*), x*) with growth constants c/(l — cji) 
for V and c/a/ {I — c/a) for x. 

At the end of the following section we apply this result to show con- 
vergence of the proximal-point method. 

3. LOCAL UPPER-LIPSCHITZ CONTINUITY 
OF THE NEWTON MAP 

Consider the following perturbed version of the Newton method (5) 
in which V/(x^) is replaced by a sequence of matrices Hn and the pa- 
rameter V may vary from step to step: 

^ f(xn) + Hn{Xn+l ~ Xn) + F{XnO-l) fom = 0, 1, • • • . (16) 



Our first result relates the local upper-Lipschitz property of the map 
L (or, equivalently, S) to convergence of Newton sequences. 

Theorem 3.1 LetL be locally upper-Lipschitz at (x*,x*) with constants 
a and b for neighborhoods and c for growth. Let the positive constants a 
and K satisfy 



CT < CK < n{l + 2a) + ^La^ < 6, (17) 

let {un} be a sequence of vectors with G J5«(u*),n = 1,2, •••; and 
let Hn be a sequence of matrices with Hn G jB^(V/(x*)), n == 0, 1, • • • . 
Then for every initial point xq G B(j{x*), every sequence {x^} obtained 
from (16) and whose elements are all in B(^{x*) for n = 1, 2, • • • satisfies 



I|a:n+1 - + ||(i?„ - V/(x*))(x„ - x* 

i — CK, \ 



+ \\Vn — V* 



(18) 



Proof. Choose n and a such that the inequalities (17) hold and let 
{x^} be a sequence satisfying the conditions of the theorem. Fix n > 1. 
We apply Lemma 2.1 with G = L~\ h{x) = f{xn)A-Hn{x-Xn) — f{x'') — 
V/(x*)(x — X*), a = a^ /3 = k(1 4- a) + A = la. The so defined 




Lipschitzian Stability of Newton’s Method for Variational Inclusions 131 



a,/3 and A satisfy (7) and h satisfies (8). From Lemma 2.1, the map 
{h + G)~^ is locally upper-Lipschitz at {v* + h{x*)^x*) with constants a 
and /? for the neighborhoods and a growth constant c(l — cA). We have 

^n-Hl ^ {h + G) ^{Vn) B(j{x*) 

and 

\\Vn -V* - h{x*)\\ < \\Vn “ 1^*|| + ||/(^n) + Hn{x* - Xn) ~ f{x*)\\ 

< \\vn - + \\{Hn - Vf{x*)){xn ~ x*)|| + - rr*lp 

< K -V tvcr — /?. 

Hence, from the upper-Lipschitz continuity of (/i + G)~^, we obtain (18). 

□ 



Corollary 3.1 Let the assumptions of Theorem 3.1 hold, let Vn ~ v* 
for all n, and let a and k be the associated constants. Choose a and n 
smaller if necessary so that 



+ n) 

(1 -ck) - ■ 

Then every Newton sequence {xn} obtained from (16) with xq G has 
all elements outside the ball S^(x*) or: 

(a) Xn is linearly convergent to x*; 

(b) if Hn — > V/(x*) as n oo, then the sequence Xn is superlinearly 
convergent to x* ; 

(c) if Hn = Vf{xn), then Xn is quadratically convergent to x* . 

Proof. Observe that, on our assumptions, if {xn} is a Newton se- 
quence and for some i > 0 we have xi G R< 7 (x*), then Xn G B^ix*) for 
all n > i. Then apply (18). □ 



In the remaining part of this section we consider the unperturbed 
Newton method (5) with a parameter v. Recall that N(x,t^) is the set 
of all Newton sequences satisfying (5) for v which start from the point 
X. The following theorem shows the close relation of the map N with 
the map P defined in (9), and hence with the maps S and L according 
to Corollaries 2.1 and 2.2. 
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Theorem 3.2 The following are equivalent: 

(i) The map P is locally upper- Lips chit z and nonempty-valued at 
iiv\x*),x*); 

(ii) The map N is locally upper- Lip schitz and nonempty-valued at 

{{x*,v*),C)- 

Proof, (i) (ii). IfP is locally nonempty- valued, then for each (u, x) 

close to (u*, X*) there exists a Newton step, hence N is locally nonempty- 
valued as well. Let P be locally upper-Lipschitz with constants a and 
b for the neighborhoods and c and /a for the growth of v and x, respec- 
tively. According to Remark 2.1, the constant fi can be made arbitrary 
small by choosing smaller b. Thus, without loss of generality, suppose 
that /i < 1. Let ^ = {^ 1 , 3 : 2 , • * * • • •} E N(x,u) fl Ba{C) some 

(x,u) E Then xi E P(t’,x) D Ba{x*) and from the local 

upper-Lipschitz continuity of P we have 

\\xi - x*|| < c\\v - 'L'*|l + /i||x - X*||. 

Proceeding by induction, from x^+i E P(u,Xn) H Ba{x*) and from the 
local upper-Lipschitz continuity of P, we obtain 

ll^n+l ~ ^*11 ^ c(l -h /i 4- + • • • + — 1^*11 + — x*||. 

Then, 

sup ||a:„+i - a^*|| < -r-~\\v - ?^*|| + mII^ ~ x*\\. 

n>l ^ — T 

We obtain that the map N is locally upper-Lipschitz at ((x*,u*),^*) 
with constants a, b for the neighborhoods, and growth constants /i for x 
and c/(l — /i) for v. 

(ii) (i). Let N be locally upper-Lipschitz and locally nonempty- 
valued at ((x*, u*), (^*) with constants a^b and c. Let {x^v) E Bfj{x*^v*) 
and 2 : E P('i’,x) fl Ba(x*), Then 2 ; is the first element of a Newton 
sequence starting at x ior v. From the nonemptiness assumption for N, 
there exists a Newton sequence ^ — {xi, • • • ,x^i, • • •} E N(^,u) nSa(^*). 
Observe that the sequence rj = { 2 :,xi,X 2 , • • • ,x^, • * •} is an element of 
N{x,v) nBaiC)- From the local upper-Lipschitz continuity of N (in the 
supremum norm), the first component of 77 satisfies 

\\z — x*|| < c(||x — x*|| + \\v — u*||). 

This means exactly that P is locally upper-Lipschitz at ((x*, x*), x*). 
The local nonemptiness of N{x,v) fl BalC) implies that P is locally 
nonempty- valued. □ 
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Remark 3.1 Observe that, from the above proof, the local upper -Lip- 
schitz continuity of P implies the local upper- Lips chit z continuity of N 
without the requirement P be locally nonempty-valued. 

Theorem 3.3 If the map S is locally upper- Lips chitz at {v*,x*), then 
N is locally upper-Lipschitz at {{x* Conversely, if the map N 
is locally upper-Lipschitz and nonempty-valued at ((x*, 'L ’*), then S 
is locally upper-Lipschitz at (v*,x*). 

Proof. If S is locally upper-Lipschitz, then, from Corollaries 2.1 and 
2.2, the map P is locally upper-Lipschitz at ((z;*, a:*), x*); then Theo- 
rem 3.2 (with Remark 3.1) completes the proof. Conversely, if N is lo- 
cally upper-Lipschitz and nonempty- valued at ((x*,i;*),^*), then, from 
Theorem 3.2, P is locally upper-Lipschitz at {{v* ,x*),x*), hence, from 
Corollaries 2.1 and 2.2, S is locally upper-Lipschitz at (v*,x*). □ 



As an illustration, consider the optimization problem (3). The Newton 
(SQP) iterate from Xn defined in (6) is a stationary point of the 
following quadratic program: 



min 

zee 



.5{V^g{Xn){z - Xn), Z -Xn) + {^g{Xn) ~V,Z- Xn) 



(19) 



Under the second-order sufficient optimality condition, by using the 
nonemptiness argument in the proof of Theorem 2.1, we conclude that 
the intersection of the solution set of this problem for {v,Xn) with any 
ball around x* is nonempty provided that {v,Xn) is sufficiently close to 
(u*,x*). Thus, the corresponding map P is nonempty- valued locally 
around ((u*, x*), x*). The converse follows from Theorems 2.2 and 3.2. 
Thus we obtain 



Corollary 3.2 The following are equivalent: 

(i) The second- order sufficient optimality condition (10) holds at 
(u*,x*). 

(ii) The Newton map N defined as in (19) is locally upper-Lipschitz 
and nonempty-valued at ((x*, ^;*), ^*) and x* is a local solution of 
(3) for V = v"" . 

Further, from Theorems 2.2, 3.1 and from Corollary 3.2 we obtain 
that under the second-order sufficient optimality condition, every New- 
ton (SQP) sequence Xn within a sufficiently small neighborhood of the 
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solution X* is quadratically convergent to x*] moreover, the map of SQP 
sequences has the local upper-Lipschitz property. Note that we do not 
impose any conditions on regularity of the constraints. Also note that, 
because of the polyhedrality and Robinson’s theorem [22], the map from 
V to the set of all sequences of Lagrange multipliers associated with 
a Newton (SQP) sequence x^ satisfying (6) has the (standard) upper- 
Lipschitz property. 

For problems with nonlinear smooth constraints, the SQP method is 
obtained when the Newton method is applied to the variational inequal- 
ity of the first-order optimality conditions (the Karush-Kuhn- Tucker sys- 
tem). To be specific, consider the problem 



min g{z) subject to ^j{z)<0, j = l,2, zeYC^ (20) 

with three times continuously differentiable g and g? = (^i,*''?V^m) 
and with a solution z* at which the Mangasarian-Fromovitz condition 
holds (to guarantee the existence of Lagrange multipliers, say y*). The 
Karush-Kuhn- Tucker optimality system has the form 

Vg{z) + Vip{zf'y = 0 
^{z) e Njin{y). 

The SQP method is then the Newton method (5) with x = (^,y), 



f{x) 



S/g{z) + 



and F{x) = 






Note that Corollary 3.1 now claims quadratic convergence of Newton 
sequences for both the primal variables 2 : and the Lagrange multipliers 
y. From [8], Theorem 2.6, we obtain that this convergence is guaranteed 
when the combination of the strict Mangasarian-Fromovitz condition 
and the standard second-order sufficient optimality condition holds. 

As another application of our approach, consider the well-known prox- 
imal point method 



V € f{Xn+l) + y{Xn+l - Xn) + F{Xn+l), (21) 

where /i is a positive constant. Now the role of the map P is played by 
the map defined in (15). Applying Corollary 2.3 we obtain that if the 
map S is locally upper-Lipschitz with a growth constant c, then every 
sequence generated by (21) for v* and within a sufficiently small ball 
around x* is linearly convergent to x*. Further, let be the map from 
the pair (x, v) to the set of all sequences obtained by (21) for v and y and 
starting from x. The constant sequence = {x*,x*, • • •} G T^(x*,x*). 
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By noting that the growth constant of the map defined in (15) can 
be < 1 for jjL sufficiently small and repeating the argument in Theorem 
3.2, we obtain: 

Theorem 3.4 If the mapping S is locally upper-Lipschitz at 

then for a sufficiently small positive p the map is locally upper- 

Lipschitz at 

There is a broad field of applications of our approach to other mod- 
els in optimization; in particular, to the extended nonlinear program- 
ming problem introduced by Rockafellar [24] which covers the conven- 
tional nonlinear programming models as well as penalty-function and 
augmented-Lagrangian type models. 

4. AUBIN CONTINUITY 

The concept of Aubin continuity can be traced back to the original 
proofs of the Lyusternik and Graves theorems, see [10, 25] for discussions. 
In mathematical programming it has appeared as “metric regularity” ; in 
a topological framework, it was called “openness with linear rate”. J.-P. 
Aubin was the first to define this concept as a continuity property of 
set- valued maps, calling it “pseudo-Lipschitz continuity” [1]. Following 
[7] we use the name “Aubin continuity” . 

Definition 4.1 Let T map to the subsets of and let (y*,x*) G 
graphr. We say that T is Aubin continuous at with constants 

a, b for neighborhoods and c for growth if for every G Bf){y*) and 

every x' G F(y') fl Ba(x*), there exists x" G T{y") such that 

Directly from the definition one can extract the following properties 
of Aubin continuous maps. If a map F is Aubin continuous at (y*,x*) 
with constants a, b and c, then for every 0 < a' < a and 0 < 6' < 6 the 
map r is Aubin continuous at (y*,x*) with constants a', 6' and c. If, 
in addition, 6' < a'/c, then F(y) fl Ba'{x*) ^ 0 for all y G Ry(y*). If 

r is Aubin continuous at (y*,x*) with constants a, b and c, there exist 

constants a', b' and S such that for every (y^x) G graph F fl x*)) 

the map F is Aubin continuous at (y,x) with constants a', 6' and c. 
Finally, for a set-valued map G, if G~^ be Aubin continuous at (y*^x*) 
with constants a, 6, and c, then for every e < b the map 

y ^ {x E I y G G{x) + Re(0)} 

is Aubin continuous at (y*^x*) with constants a, b—e, and c. An infinites- 
imal characterization of the Aubin property is given by Mordukhovich’s 
coderivative criterion, see [25], Chapter 9. 
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We study the Aubin continuity by following the pattern established 
in the previous two sections. Our first and basic observation is that, 
similar to the upper-Lipschitz property, the Aubin continuity is invariant 
under linearization. This fact is contained in the classical Lyusternik and 
Graves theorems and their numerous generalizations, see, e.g., [2, 5, 14]. 
(It is perhaps less known that both Lyusternik and Graves used in their 
proof a version of the Newton method, the so-called modified Newton 
method, where, in terms of our notation in (5), ^ f{xn) is replaced by 
V f{x*). In some of the Russian literature on the subject, see e.g. [3], the 
modified Newton method is called “Lyusternik process”.) This result is 
presented below in a form very instrumental for applications. 

Lemma 4.1 Let G maps to the subsets of and let G~^ be Aubin 
continuous at with constants a and b for neighborhoods and c for 

growth. Let the nonnegative constants a, j3 and X satisfy 

Q ^ u, Ac 1, Oi r — /? + A^q^ + ^ (22) 

1 — Ac \ 1 — Ac/ 

Then for every function h : -> R’^ which is Lipschitz continuous 

on Ba{x*) with a Lipschitz constant A, the map {h + G)~^ is Aubin 
continuous at (v* -I- h(x*), x*) with constants a and (3 for neighborhoods 
and c/(l — Ac) for growth. 

Proof. Let G - 8 / 9 ( 2 /*), where y* = v* A- h{x*) and let x' G 

{h + G)-^{y')DBa{x*). Then a;' G G~^{y' - h{x'))nBa{x*) and for both 
y = y' and y = y” we have 

\\y _ h{x^) _ ^*)|| < 11^ _ ^*11 + iih{x') - h{x*)\\ <P + Xa<b. 

From the Aubin continuity of G~^ we obtain that there exists X 2 G 
G~^{y" — h{x')) such that 

11^2 - ^'11 < c\\y' - y"||. 

Set xi = x'. By induction, suppose that there exists a sequence {xj^} 
such that for all A: = 2, 3, • • • , n, 

\\Xk - Xk-1 II < [\c)^~‘^\\x2 - Xi II 

and 

y” G h{xk-i) + G{xk). 

Using (22), we have 

k 

lkfc-a:*|| < ||a;i -a;*|| + ^||a;j -Xj-ill 

J=2 



( 23 ) 
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< ||a:i -a;*|| + -Xj_i|| 

J=2 

< « + Y^lly'-2/l 

< a+ — 4 ' ^ “ (24) 

1 — Ac 

and 

\\y"-h{Xn)-V*\\ < \\y"-y*\\ + \\h{Xn)-h{x*)\\ < 

Then there exists Xn+i € G~^{y” — h{xn)) such that 
\\Xn+l-Xn\\ < c\\h{Xn) - h{Xn-l)\\ 

< cA(cA)"“^||rr2 - o^ill 
= (cA)”“^||a;2 - xi||. 

The induction step is complete. Since cA < 1 the sequence is 
a Cauchy sequence, hence convergent to, say, x" which is in Ba(x*) 
because of (24). Passing to the limit in (23) we obtain that x" € 
— h{x")) — {h + G)~^{y") and we are done. □ 

Without changing it. Lemma 4.1 can be stated in abstract spaces, i.e. 
by replacing R** with a complete metric space and R*” by a linear metric 
space with a shift-invariant metric. 

An application to the maps S and L defined in Section 2 gives us the 
following analog to Corollary 2.1: 

Corollary 4.1 The following are equivalent: 

(i) The map L is Aubin continuous at {v*,x*). 

(a) The map S is Aubin continuous at {v*,x*). 

The next corollary is a generalization of the previous one; it shows that 
the Aubin property is persistent when passing from S to the map defined 
by the linearization at a point close to the reference point (v*,x*). 

Corollary 4.2 The following are equivalent: 

(i) The map S is Aubin continuous at {v*,x*). 

(a) there exist constants 6, a, b and c such that for every {v',x') G 
Ra((v*, X*)) n graphs the map {f{x') + 'Vf{x'){- — x') + F{-))~^ is 
Aubin continuous at {v',x') with constants a, b and c. 
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Proof. Let (i) holds. From the properties of Aubin continuous maps 
given after Definition 4.1, there exists a 5 > 0 such that the map S is 
Aubin continuous at any {v\x') G B^((v*,x*)) fl graph S with constants 
independent of the choice of (u',x'). We now apply Lemma 4.1 with 
G = f + F and h(x) = -f(x) + /(x') -f V/(x')(x - x') at a reference 
point (f',x'). Let x',xi,X 2 G Ba(x*). Then 

\\h{xi) - h{x 2 )\\ = II - f{xi) + f{X 2 ) - Vf{x'){xi - 0:2)11 

< f ||V/(a:i + i(x2 - Xi)) - V/(x')||di||o;i - 0:211 < 2La|lxi - a;2||. 
Jo 

Hence, by taking smaller a is necessary, the Lipschitz constant A = 
2La of h can be obtained small enough such that Ac < 1, where c is the 
growth constant of S. Thus, the map {h + G)~^ = (/(^O + V/(x')(- — 
x') + is Aubin continuous as claimed in (ii). 

If (ii) is satisfied, then of course for ('c',x') = ('c*,x*) we obtain that 
the map L is Aubin continuous at (?;*,x*), therefore S is Aubin contin- 
uous at (r»*,x*), from Corollary 4.1. □ 



The following is an analog of Corollary 2.2. 

Corollary 4.3 The following are equivalent: 

(i) The map L is Aubin continuous at {v*^x*). 

(ii) The map P is Aubin continuous at ((i;*, x*), x*). 

Proof. Let (i) hold and let a, b and c be the constants of Aubin 
continuity of L. Choose a and /? such that the inequalities (22) hold 
with A = \Lj3. Take a and j3 smaller if necessary such that 

L{^lA + 2a) < 1. (25) 

Let ('c',x') and ('c",x") are in J5^/2(('^*? ^*)) z’ G P(x', u') nSa(x*). 
We apply Lemma 4.1 with 

G(z) = /(x*) + V/(x*)(^ - X*) + F{z) 



and 



h{z) - /(x") + V/(x")(^ - x") - /(x*) - V/(x*)(z - X*). 
Let Z\^Z 2 G Sq(x*). Then 

\\h{z,) - h{z 2 )\\ < ||V/(x") - V/(x*)||||^i - Z 2 || < \lP\\zi - Z 2 
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that is, h is Lipschitz continuous with a Lipschitz constant A. Hence, 
from Lemma 4.1 the map {h + G)~^ is Aubin continuous at + 
h{x*),x*) with constants a, /3 and 7 = c/(l — Ac). Note that 

z'e{h + G)-i (?;' + fix") + Vf(x")iz'-x") - fix') - Vfix')iz'-x')^ 

n BJx*) 



and 

lit;' + fix") + Vfix")iz' - x") - fix') - Wfix')iz' - x') 

-V* - fix") - Vfix")ix* - x") + /(x*)|| 



< ||v-w*|| 

+ \\fix*) - fix') - Vfix')ix* - x') + iVfix') - Vfix"))ix* - 

< ^ + ^Li0/2)^ + Lpa<P, 

because of (25). Hence, there exists z" G (h + G)~^iv") such that 
\\z" - ^'11 

< ^\\v" -v' - fix") - Vfix")iz' - x") + fix') + Vfix')iz' - x')|| 

< 711?;" - v'll + -fWfix") - fix') - Vfix')ix" - x')|| 

+7\\iVfix')-Vfix"))ix"-z')\\ 

< 7||u" — f'll + L7(3/?/2 + a)\\x' — x"||. 

Thus P is Aubin continuous. The converse implication is immediate by 
noting that L(?;) = P(?;,a;*). □ 



Remark 4.1 By choosing sufficieMly small constants a and P for the 
neighborhoods, the growth constant of P with respect to x can be made 
arbitrary small. 

In the remaining part of this section we study the Newton method 
for Aubin continuous maps. For simplicity, we consider the unperturbed 
Newton method (5); the analysis can be carried over without major 
changes to the perturbed form (16). The following theorem shows that 
the Aubin property implies the existence of Newton sequences for all v 
close to V* that are quadratically convergent to a solution xiv) of (1): 

Theorem 4.1 Suppose that the map S is Aubin continuous at iv*,x*). 
Then there exist constants k, a and © such that for every v G Rk(u*) 
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and every x E S{v) D the following holds: for every initial point 

xq G B(^{x*) there exists a Newton sequence {x^} such that for n = 

0 , 1 , 2 ,..., 

||x„+i - x|| < 0||x„ - x|p. (26) 



Proof. Let a, b and c are the constants the existence of which is 
claimed in Corollary 4.2; that is, for any {v^x) G ^x*)) flgraphS 

the map {f{x) + Vf{x){- — x) + F{-))~^ is Aubin continuous at (v^x) with 
constants a, b and c independent of the choice of (^,x). We repeatedly 
apply Lemma 4.1 with 

G{x) = f{x) + Vf{x){x -x) + F{x) 



and 



h{x) = f{Xn) + Vf{Xn){x - Xn) ~ f {x) - Vf{x){x - x). 

Choose a = 6 and k = 5 and let v € B^{x*) and x G S(v) fl Bu{x*). 
We will adjust the constant a after the first step of Newton’s method. 
Let xo € B„{x*) and put n = 0. Then h{x) = f{xo) + Vf{xo){x — xq) — 
f{x) — V f{x){x — x) and 

||h(a;i) - h{x 2 )\\ < ||V/(xo) - Vf{x)\\\\xi - X 2 W < 2La\\xi - X 2 W 
for every X\,X 2 E R”. Note that 

{x, V + /(xo) + V/(xo)(x - Xo) - /(x)) e graph (h + G) 

and 

||/(xo) + V/(xo)(x - Xo) - /(:r)|| < 2La^. (27) 

We apply Lemma 4.1 with the so chosen h (for n = 0) and G and with 

A == 2 L( 7 , a = a and (3 = 2La^. 

It is a matter of simple calculation to adjust the constant a such that the 
inequalities (22) hold; for example, we choose a so small that 2Lac < 1, 
etc. Then, from Lemma 4.1, the map {h G)~^ is Aubin continuous 
at {v + f{xo) + V/(xo)(:r — o:o) — f{x)^x) with constants a, (3 and 7 = 
c/(l — A). Hence, from (27) there exists 

xi^{hA- G)~^{v), 

that is, a Newton step from xq such that 

||a;i - xll < 7 II - /(xo) - V/(xo)(x - xo) + /(x)|| < 2^Lo^. (28) 
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At this point we take a smaller if necessary such that 



2cLa 

l-2La 



< 1 . 



Then, from (28), x\ E B^^x), 

From now on the constants a is fixed. To obtain a Newton iterate 
X2 we apply Lemma 4.1 with the same G and with h{x) = f{xi) + 
V/(xi)(x — xi) — f{x) — Vf{x){x — x). Here xi will play the role of xq 
in the same way as in the first step obtaining that the map {h + G)~'^ is 
Aubin continuous at {v+f{xi) + Wf{xi){x—xi) — f{x)^x) with constants 
a, /? and 7: the same constants as at the first step. Then there exists 
X2 ^ {h-\- G)~^{v)^ that is, X2 is a Newton step from xi for such that 

\\X2 - x\\ < 7II - f{xi) - 'Vf{xi){x - Xi) + /(x)|| < ^nfLWxi - xf. 

The Newton iterates x^, x^, • ■ ■ are obtained in the same way. Taking 
0 = 7L/2 the proof is complete. □ 



Recall that N(x,v) is the set of all Newton sequences, that is, se- 
quences that satisfy (5) starting from the point x and associated with 
the value v of the parameter. The constant sequence is an elemerf of 
N(x*,v*). 

Theorem 4.2 Suppose that the map S is Aubin continuous at {v*,x*). 
Then the map N is Aubin continuous at {{x*,v*),^*). 

Proof. From Corollaries 4.1 and 4.3, the map P is Aubin continuous 
at {{v* ,x*),x*); let a, b be the associated constants for neighborhoods, 
c the growth constant with respect to v and p the growth constant with 
respect to x. By taking a and b smaller if necessary, we can have p < 1. 
Let v',v" E Bb{v*) and x',x' E Bb{x*), and let € N(x',u') fl Ba{^*), 
= {x'i,x'2i ■ ■ ■ ,x^, ■ • •}. Note that x'l E P(u',x') fl Ba{x*). Then there 
exists x'( E P(u",x") such that 

11^1 ~ ^illl ^ AW ~ ^^11 + p\\x” — x'\\. 

Further, x'2 E P(v[,x') Pi Ba(x*), hence there exists x'2 E P(v",Xi) such 
that 

11^2 “ ^ AW ~ '^11 + tJ'Wx'i — -^1 II < c(l -I- p)\\v'' — u'll -f p^\\x" — x'||. 

By induction, we obtain that there exists a sequence E N(x",u"), 
= {x'l, x'l, ■ ■ ■ , x", • ■ •}. such that 

ll^n ~ ^nlll — ^(1 -\- p -\- p^ -h • • • -h p" ^)||'y^^ — U^|| -|- p”||x^^ — X^||. 




142 Asen L. Dontchev 



Taking the supremum norm we obtain that N is Aubin continuous with 
constants a and b for the neighborhoods and growth constants fi for x 
and c/ (1 — /i) for v. □ 



In the above theorem N(x, v) is the set of all Newton sequences start- 
ing from X for v. From Theorem 4.1 we know that at least one of these 
sequences will be convergent provided that x is sufficiently close to x* 
and V is sufficiently close to v*. Denote by J\f{x, v) the subset of N(x, v) 
containing all convergent Newton sequences starting from the point x 
and associated with v. The following theorem is a kind of converse to 
Theorem 4.2. 

Theorem 4.3 If the map M is Aubin continuous at ((x*, z;*), then 
the map S is Aubin continuous at 

Proof. Let J\f be Aubin continuous with constants a, b and c. Let 
v\v” G and let x' be an element of S(x') nBa{x*). The constant 

sequence = {x', x', • * * ? ^^ * * •} is an element of Af(x'^v') H Ba{C)- 
From the Aubin property of J\f there exists = {x'/, X 2 ^ • • • , xj^, • • •} G 
J\f{x\v")^ such that for every n > 1 

|K-x;||<c|K-x'||. (29) 

Since G A/*(x',x"), is convergent, say to x". Clearly, every conver- 
gent Newton sequence for x" is convergent to a solution of (1) for 
Then x" is an element of S(x"). Passing to the limit in (29) we obtain 
that 

||x"-x'|| <c\\v” -v^l 

Thus S is Aubin continuous. □ 



Let us apply the above result to our illustrative optimization problem 
(3). The basic condition here will be the Aubin continuity of the map 
X defined in (11) (or, equivalently, of the map C defined in (12)). If we 
assume the Aubin property for £, however, we end up having C locally 
singe- valued because of the following result obtained in [7]: Let A be an 
n X n real matrix and let C be a convex polyhedral set. Then the map 
{A + Nc)~^ is Aubin continuous at {v*^x*) if and only if it is locally 
single-valued and Lipschitz continuous around (x*,x*). 

Using the terminology of [25], we call the property just described 
Lipschitzian localization; we consider it in the following section. 
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5- LIPSCHITZIAN LOCALIZATION 

The Lipschitzian localization property simply says that when restricted 
to a neighborhood of a point in its graph, a possibly set- valued map be- 
comes a Lipschitz continuous (single-valued) function. 

Definition 5.1 Let T maps to the subsets oflV^ and let G 

graph r. We say that T has a Lipschitzian localization at (y*,x*) with 
constants a, b and c if the map y r(y) Pi Ba{x*) is single-valued (a 
function) and Lipschitz continuous in Bb{y*) with a Lipschitz constant 
c. 



If a map F has a Lipschitzian localization at then it is Aubin 

continuous at (y*, x*) with the same constants. Conversely, if F is Aubin 
continuous at (y*, x*) with constants a, b and c and, in addition, for some 
positive constants a and (3^ F(y) fl Ba{x*) consists of at most one point 
for every y G R/?(y*), then F has a Lipschitzian localization at (y*,x*) 
with constants a',6',c' provided that 

0 < a' < min{a, a} and 0 < 6' < min{6, /?, a'/c', (a — a')/c'}. 

The following lemma is an inverse function theorem which can be 
proved in various ways. Its statement is parallel to Lemmas 2.1 and 4.1. 

Lemma 5.1 Let G maps to the subsets of R"^ and let G~^ has a 
Lipschitzian localization at {v*^x*) with constants a and b for neigh- 
borhoods and c for growth. Let the nonnegative constants a, f3 and A 
satisfy 



a < a, Ac < 1, c{(3 H- Xa) < a, /? -h Aa < 6. (30) 

Then for every function h : R^ -> R’^ which is Lipschitz continuous on 
Ba{x*) with a Lipschitz constant A, the map {h-\-G)~^ has a Lipschitzian 
localization at (v* + h(x*),x*) with constants a and (3 for neighborhoods 
and c/(l — Ac) for growth. 

Sketch of Proof. Let a,^, A satisfy (30) and let h be Lipschitz 
continuous on Ba{x*) with a Lipschitz constant A. Let v G Bp{v*) 
and let y = V /i(x*), y* = ?;* + h{x*). Then ||y — h{x) — y*\\ < 
II?; — ?;*|| + \\h{x) — h{x*)\\ < /3 + Xa < b. It is easy to see that, for each 
V G Bp{v*)^ the map 

X $(x) G~^{y — h{x)) fl Ba{x*) 

is a function from Ba{x*) to Ba{x*) and is a contraction, hence it has 
a unique fixed point, say, x(y), in Ba{x*). Since x(y) G (G + h)~^{y) fl 
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Ba{x*)^ the map (G+/i) ^nBa{x*) is a function defined (with nonempty 
values) on Bis{y*). Moreover, for any G Bj^{y'') we have 

\\x{y') - x{y”)\\ 

= \\G-\y' - h{x{y'))) n Ba{x*) - G~\y" - h{x{y"))) n 5,(x*)|| 

< c{\W -y"\\+\\\x{y')-x{y")\\\ 
hence 

\\x{y') - x{y'')\\ < - y''\\- 

Thus y x(y) is Lipschitz continuous with a Lipschitz constants c/(l — 
Ac). □ 



Applying Lemma 5.1 to the maps S, L and P defined in Section 2 we 
obtain: 

Corollary 5.1 The following are equivalent: 

(i) The map L has a Lipschitzian localization at (v*,x*). 

(a) The map S has a Lipschitzian localization at {v*^x^) 

(Hi) The map P has a Lipschitzian localization at ((t;*, x*), x*). 

In optimization, the existence of Lipschitzian localization of the lin- 
earization map L is often called “strong regularity”, a concept intro- 
duced by Robinson; actually, the implication (i) (ii) is essentially the 
contents of Robinson’s implicit function theorem in [21]. A characteri- 
zation of the Lipschitzian localization property of a locally continuous 
set-valued map in finite dimensions is provided by the so-called strict 
graphical derivative, see [25], Theorem 9.54. A far reaching recent study 
of optimization problems with solution maps having this property is 
given in [18]. 

For our illustrative problem (3) the Lipschitzian localization property 
of the solution map is equivalent to the so-called strong second- order 
sufficient optimality condition which has the form: 

(i/, V^y(x*)w) > 0 for all nonzero u E K — (31) 

where K the critical cone at (?;*,x*) defined in Section 2. Specifically, 
we have the following result which can be extracted from [7] or from 
more recent papers [13, 17, 18]: 

Theorem 5.1 The following are equivalent for the problem (3): 
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(i) The strong second-order sufficient optimality condition (31) holds 
at 

(a) The Argmin map has a Lipschitzian localization at {v*^x*). 

By combining Theorem 5.1 with the last property of Aubin continuous 
maps given after Definition 4.1 and the result in [7] given at the end of 
Section 4, we obtain that the strong second-order sufficient optimcJity 
condition is equivalent to the property that for every sufficiently small 
6: > 0 the e-stationary map 

V i-> X£{v) == {x G I dist {v — Vg{x),Nc{x)) < e} 
is Aubin continuous at {v*^x*). 

Let us consider the Newton method (5) applied to the general model 
(1). The analog of Corollary 3.1 and Theorem 4.1 has the follovdng 
form: 

Theorem 5.2 Suppose that the map S has a Lipschitzian localization 
at (v*,x*). Then there exist constants k, a and © such that for every 
V G and every initial point xq G Ba{x*) there exists a unique 

Newton sequence {xn} and this sequence is quadratically convergent to 
X := S{v) n Ba{x*) with a constant 0, that is, 

Ikn+l - ^11 < ©Ikn - n = 0, 1, 2, • • • . 

Proof. The proof is completely analogous to the proofs of Theorem 
4.1 with Lemma 5.1 replacing Lemma 4.1. □ 

The next result is the analog of Theorems 3.3, 4.3 and 4.4. 

Theorem 5.3 The following are equivalent: 

(i) The map S has a Lipschitzian localization at (u*,a;*). 

(a) The Newton map N has a Lipschitzian localization at ((x*,v*),^*). 

Applied to the illustrative optimization problem (3) and the SQP 
method (6), we obtain that the strong second-order sufficient optimality 
condition (31) at is equivalent to the following: for every v near 

V* and X near x* there exists a unique sequence of primal variables Xn 
starting from x and which is quadratically convergent to the unique 
solution x(v) of (3) for v, for some sequence of Lagrange multipliers pn- 
Moreover the map from (x, v) to the set of all sequences Xn staring from 
X for V has a Lipschitzian localization at ((x*, 




146 Asen L. Dontchev 



For the nonlinear program (20), the Newton (SQP) step is performed 
for the pair {x^ y) with x being the primal variable and y the Lagrange 
multiplier. Theorem 5.4 extends the characterization of the Lipschitzian 
localization property of the Karush-Kuhn- Tucker map to the SQP map. 
Specifically, if we assume that the reference solution is isolated, then 
the Lipschitzian localization property of the SQP method is equivalent 
to the combination of the linear independence condition for the active 
constraints and the strong second-order sufficient optimality condition 
at the reference point. 
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Abstract We compare and contrast a number of recent sequential quadratic pro- 
gramming (SQP) methods that have been proposed for the solution 
of large-scale nonlinear programming problems. Both line-search and 
trust-region approaches are studied, as are the implications of interior- 
point and quadratic programming methods. 



1. INTRODUCTION 
1.1. PERSPECTIVES 

By the start of the 1980s, it was generally accepted that sequen- 
tial quadratic programming (SQP) algorithms for solving nonlinear pro- 
gramming problems were the methods of choice. Such a view was based 
on strong convergence properties of such algorithms, and reinforced in 
the comparative testing experiments of [41], in which SQP methods 
clearly outperformed their competitors. Although such claims of supe- 
riority were made for implementations specifically aimed at small-scale 
problems, — that is, those problems for which problem derivatives can 
be stored and manipulated as dense matrices — there was little reason 
to believe that similar methods would not be equally appropriate when 
the problem matrices were too large to be stored as dense matrices, but 
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rather required sparse storage formats. Remarkably then, it is only in 
the latter part of the 1990s that SQP methods for sparse problems have 
started to appear in published software packages, while sparse variants 
of the methods that SQP was supposed to have superseded (for instance 
MINOS, see [50], and LANCELOT, see [16]) have been used routinely 
and successfully during the intervening years. 

In our opinion, this curious divergence between what logically should 
have happened in the 1980s, and what actually came to pass may be 
attributed almost entirely to a single factor: quadratic programming 
(QP) methods (and their underlying sparse matrix technology) were not 
then capable of solving large problems. Witness the almost complete 
lack of software for solving large-scale (non-convex) quadratic programs 
even today, especially in view of the large number of available codes for 
the superficially similar linear programming problem. 

The purpose of this paper is to survey modern SQP methods, and 
to suggest why it is now reasonable to accept the widely-held view that 
SQP methods really are best. There have been a number of surveys 
of SQP methods over the past 20 years, and we refer the reader to 
[1, 17, 56, 57, 61]. Much of the material in this paper is covered in full 
detail in our forthcoming book on trust-region methods [19], which also 
contains a large number of additional references. 

1.2. THE PROBLEM 

We consider the problem of minimizing a (linear or nonlinear) function 
f of n real variables x, for which the variables are required to satisfy 
a set of (linear or nonlinear) constraints Ci{x) > 0, i — l,...,m. For 
simplicity, we ignore the possibility that some of the constraints might 
be equations, since these are easily incorporated in what follows, nor 
shall we consider any special savings that can be made if some or all of 
the constraints have useful structure (e.g., might be linear). We remind 
the reader that if is a local solution to the problem, and so long as a 
so-called constraint qualification holds to exclude pathological cases, it 
follows that the first-order criticality conditions 

g{x^) = A^{x^)y^, c(x*) >0, y* > 0 and ylc{x^) = 0 (1) 

will hold. Here c(x) is the vector whose components are the c^(x), 
g{x) = Vj^/(x) is the gradient of /, A{x) = Vxc{x) is the Jacobian of 
c, and are appropriate Lagrange multipliers. Notice that the first re- 
quirement in (1) is that the gradient Va^^(x*,^*) of the Lagrangian func- 
tion ^{x,y) = f{x) —y^c{x) should vanish. For future reference, we also 
denote the Hessian of the Lagrangian function by H{x,y) = 
and will let c_ be the vector whose i-th component is min(c^(x), 0), 
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Throughout this paper, we shall make a blanket assumption that 

Al. / and the q have Lipschitz-continuous second derivatives (in the 
region of interest). 

Throughout || • |1 will denote a generic norm. While there may be 
good practical reasons for choosing a specific norm, and while some of 
the given results have only been established in such a case, we suspect 
there are very few places where general results in arbitrary norms are 
not possible. 

1.3. GENERIC SQP METHODS 

For a most transparent derivation of the basic SQP method, we note 
that the final requirement in (1) (the complementarity condition) implies 
that Lagrange multipliers corresponding to inactive constraints (those 
for which c^(x*) > 0) must be zero. Thus, so long as the given inequali- 
ties hold, (1) may equivalently be written as 

g{x^) - = 0 and c^.(a:*) = 0, (2) 

where the subscript Ai, indicates the components corresponding to the 
active set — {i \ Ci{x^) = 0}. Of course A^ depends on x*, but 
suppose for the time being that we know We then note that, if A^ 
has elements, (2) is a set of n + nonlinear equations in the 

n + unknowns x and . 

The best-known method for solving such systems (when it works) 

is Newton’s method, and the basic SQP method is simply Newton’s 

iteration applied to (2). This leads to an iteration of the form 

/ 1 ^ f 

V (yA:-fl).4* / V iVk)A* + 

where 

( ^AS^k) W Sk \ ^ ^ ( 9{Xk) - ^AS^k){yk)A» 

V AA,{xk) 0 J \ ~{Vk)A, ) V CA,{Xk) 

( 3 ) 

to correct the guess (xjt, {yk)A*)' Here is a “suitable” approximation 
of H{xk-,yk)^ where the nonzero components of y^ are those of {yk)A^- 
Since this is a Newton iteration, we expect a fast asymptotic convergence 
rate in many cases, so long as Hk is chosen appropriately. Interestingly, 
fast convergence does not require Hk to converge to i/(x*,y*), and con- 
siderable effort over the past 25 years has been devoted to obtaining 
minimal conditions, along with practical choices of which permit 
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satisfactory convergence rates. We refer the interested reader to any of 
the previously mentioned surveys, and the papers cited therein, for more 
details. 

Most revealingly, we may rewrite (3) as 

/ Hk ^\{xk)\f Sk 9{xk) \ 

V ^A,{xk) 0 J [ -(yk+l)A, J V ^A.(^k) J ’ 

which are the first-order criticality conditions for the (equality con- 
strained) quadratic programming problem 

minimize s^g(xf^;) + subject to + A^^(x^)s = 0, 

seiR"" 2 

with (yk-hi)A^ being its Lagrange multipliers. Notice that the constraints 
here are simply linearizations of the active constraints about the current 
estimate of the solution. Further, this suggests that to avoid having 
to estimate in advance, it suffices to consider linearizations of all 
of the constraints, and to solve the (inequality constrained) quadratic 
programming problem 

minimize s"^g(xk) + subject to c(xk) + A(xk)s > 0. (4) 

seiR^ 2 

This provides the basic SQP method: given an estimate (x/^, yk) and a 
suitable solve (4) to find Sk^ update Xk+\ = x^ 5^, and (if neces- 
sary) adjust yk-^i to provide convergence to y^. Remarkably, [60] showed 
that, so long as xq is sufficiently close to x*, ffg is sufficiently close to 
i?(x*,y*), and Hk = H{xk^yk) for A: > 1, as well as 

A2. the Jacobian of active constraints A^^(x^) is of full rank, 

A3, second-order necessary optimality conditions hold at {x^, y^c), 
and 

A4. strict complementarity slackness occurs (i.e., > 0 if c^(x*) = 

0), 



the SQP iteration converges Q-superlinearly, and the set of constraints 
which are active in (4) is precisely the set A* for all sufficiently large 
k. If yk-\-i are chosen to be the Lagrange multipliers for (4), the rate is 
actually Q-quadratic. 

The important assumption here is A2, since this ensures that the La- 
grange multipliers at x*, as well as those for (4) for sufficiently large 
A:, are unique. If A^^(x*) is not of full rank, the limiting multipliers 
may not be unique, and the SQP method using the estimates obtained 
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from (4) may not converge Q-quadratically. Of course A2 is a relatively 
strong first-order constraint qualification, and [67] shows that it is pos- 
sible to replace this assumption by a weaker one due to [45] while still 
obtaining Q-quadratic convergence. To do so, the subproblem (4) must 
be modified slightly to ensure that its Lagrange multipliers are (locally) 
unique. In fact, Wright’s subproblem is equivalent to minimizing an 
augmented Lagrangian function for (4) with respect to x and simulta- 
neously maximizing with respect to y while ensuring that ^ > 0. To 
ensure a Q-quadratic rate, the penalty parameter for the augmented 
Lagrangian must approach zero as O (max[||x^ — ^*||, \\Vk — y*l|])- Bon- 
nans and Launay [4] and Hager [38] show that it is also possible to 
remove A4 if A3 is strengthened. 

Since the above iteration is essentially Newton’s method, we must, 
of course, be cautious since in general such methods are not globally 
convergent. There have been traditionally two types of globalization 
schemes, linesearch and trust-region methods, and it is these that we 
now consider. 

2. LINESEARCH METHODS 

A traditional linesearch SQP method computes Sk by solving (4), aind 
then obtains = Xk OLkSk for some appropriately chosen stepsize 
ak- The stepsize is selected so that Xk^i is closer in some way to a 
critical point than its predecessor, and linesearch methods achieve this 
by requiring that is significantly smaller than (^{xk) for some 

so-called merit function (/?. A highly desirable property of any merit 
function is that critical points of the merit function correspond to critical 
points for the underlying nonlinear programming problem. The most 
widely-used merit functions are non-smooth penalty functions of i:he 
form 

= f{x) + a\\c{x)-\\, (5) 

which depends on a positive penalty parameter a, and also smooth exact 
penalty functions of the form 

(p{x,z,a) f{x) -y^{x){c(x) - z) (6) 

+CT {c{x) — z)'^ [A{x)A^{x) + Z)~^ (c(x) — z), 

where Z is a diagonal matrix with entries Zi > 0, and where y(x) is 
defined by 

(A(x)A^(x) + Z) y(x) = A(x)V^f(x). (7) 

Relevant references include [2, 21, 23, 44, 59]. Note that none of these 
functions is actually ideal, since they may sometimes have critical points 
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at values which do not correspond to those for the underlying nonlinear 
programming problem — these rogue values usually occur at points which 
are locally least infeasible. However, it can be shown that critical points 
for the two problems coincide so long as those for the merit function are 
feasible, and so long as the penalty parameter is larger than a problem- 
dependent critical value — for smooth exact penalty functions, a further 
requirement like assumption A2 may also be required. 

It is crucial that the SQP step Sk and the merit function (p(x) be 
compatible, in the sense that the directional derivative (slope in the 
smooth case) must be negative, for otherwise the linesearch may fail. In 
many cases, this condition is guaranteed when the penalty parameter is 
sufficiently large, and when s^^HkSk > 0. While the latter condition is 
likely to hold asymptotically, there is little reason why it should be true 
far from the solution, unless is itself positive definite. For this reason, 
most active-set SQP methods work under the blanket assumption that 
Hk is positive definite, which is of course a far stronger assumption than 
A3. 

When the function (5) is used, the penalty parameter may have to 
satisfy a > where y^+i are the Lagrange multipliers for (4) 

and II • ll^) is the norm dual to || • ||. Such a condition is consistent 
with the problem-dependent critical value alluded to earlier, namely that 
O' > ||?/*||z)- An a priori bound on the size of the penalty parameter for 
(6) is harder to obtain, since it depends on the eigenvalues of 

2.1. SECOND-ORDER CORRECTION 

The main disadvantage of functions like (5) — indeed, of any merit 
function which simply tries to balance / against constraint infeasibility — 
is that there is no guarantee that the SQP step together with a unit 
stepsize = I will lead to a reduction of the merit function, however 
close the iterates are to a critical point. Thus the full Newton (SQP) step 
may not be taken, and the iterates fail to converge at the anticipated 
Q-superlinear rate. Indeed, a famous example due to [46] shows that 
this defect can actually occur. The Maratos effect happens because the 
linearization of the constraints fails to take adequate account of their 
nonlinear behaviour. 

The idea of using a second-order correction to cope with the Maratos 
effect first appeared in a number of contemporary papers (see [12, 25, 
47]). The idea is to aim to replace the update = x^ + hy a, 

corrected update 



^k+l — Xk + Sk 5 ^, 



( 8 ) 
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where corrects for the “second-order” effects due to the constraint 
curvature. Let Ak be the set of active constraints at the solution to (4). 
Then a general second-order correction is the solution to the system 

( ^\i.Xk+Pk)\( s% gl \ 

\AA^{xk+Pk) 0 J \ -Vk J \CAkixk + Sk) J 

( 9 ) 

for some appropriate pk and g^. In order that the resulting step is 
suitable, we require that g^ and pk are both small, indeed that 

9k =0 (||a;A: - a;»|lmax[||xfe - x*||, \\yk - y*||]) and pk = 0{\\xk - x*||). 

(10) 

Moreover, we also require that is uniformly positive definite on the 
null-space of {xk +Pk) for Vk) close to (x*, y*). Provided that 

these conditions are satisfied, and so long as A2-A4 hold, it is possible 
to show that the corrected update (8) is guaranteed to reduce the merit 
function (5) close to (x*,y*). Two popular choices are 

Pk = 0, gl^ 0, and = Hk, 

which gives the traditional second-order correction championed by [12, 
25, 47], and 

Pk = Sk, 9k ="^x^{xk + Sk,yk-i-i), a^nd = H{xk -h Sk.Vk-hi)^ 

which corresponds to a second SQP step, and provides the basis for 
the “watchdog technique” suggested by [11]. Note that other authors 
(for example, [53]) have also shown that a small number (> 1) of SQP 
steps ensure that (5) decreases, but couch their proposal in the language 
of the non-monotone descent methods made famous for unconstrained 
minimization by [37]. In the linesearch context, a search should be made 
along the arc Xk+ask+a^s"^^ with the expectation that ultimately ak -■ 1 
and (8) will occur. 

2.2. BOGGS, KEARSLEY AND TOLLE’S 
APPROACH 

The Maratos effect does not occur for (6), and herein lies the populcir- 
ity of methods based on this function. Traditionally such functions have 
been viewed somewhat unfavourably by most researchers since at a first 
glance they require a Jacobian value and the solution of the linear system 
(7) each time a function value is required — this may be very expensive 
if a number of different trial steps are required during the linesearch. 
This difficulty may be avoided by replacing (6) locally by a surrogate 
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(approximation) merit function in which y{x) and A{x) are replaced by 
an appropriate y and A; it can be shown that such an approximation 
is valid, and that with care global convergence properties are retained. 
See [2] for details. 

Boggs, Kearsley and Tolle [3] provide an implementation and positive 
practical experience with such a method. Of particular note is that 
instead of trying to solve (4) directly, they pick three promising estimates 
of the required solution, and subsequently find the best approximation 
to this solution in the subspace spanned by these three vectors, at least 
one of the three directions being chosen to be a descent direction for 
the merit function. A trust-region (see Section 3) is used to limit the 
steps in the tests performed, and updated appropriately, but as yet this 
enhancement has no theoretical underpinning — the current theory for 
the linesearch version requires that be positive definite. 

2.3. SNOPT 

SNOPT is a state-of-the-art linesearch-based SQP method for large- 
scale nonlinear programming due to [34]. At present, SNOPT uses a 
positive-definite approximation to the Hessian of the Lagrangian, 
which exploits the fact that frequently many variables only appear lin- 
early in the problem formulation, and retains information about previ- 
ously encountered curvature via a limited-memory secant update formula 
— we understand that a new version capable of using the exact Hessian 
of the Lagrangian is being tested. Special techniques are used to ensure 
that subsequent updates to maintain positive definiteness. Feasibil- 
ity with respect to linear constraints is attained from the outset. An 
augmented Lagrangian merit function is used to assess steps in both x 
and the Lagrange multiplier estimates y. The method is designed to be 
fiexible, in that in theory it can use any quadratic programming algo- 
rithm, although by default it uses a null-space based active set method, 
which slightly limits the size and type of problems which can be handled. 
In practice, numerical tests have shown SNOPT to be most effective. 

2.4. FEASIBLE POINT APPROACHES 

A particularly appealing idea is to ensure that all iterates remain fea- 
sible, since then the objective function is itself a suitable merit function, 
and additionally the linearized constraints are sure to be consistent as 
s = 0 lies in the set [s | c{xk) + A{xk)s > O}. In a sequence of papers, 
[5, 40, 52, 54] show that this is possible provided precautions are taken. 
It is easy to show that the SQP direction Sk from (4) at a feasible point 
Xk is a descent direction for f{x) provided that is positive definite. 
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However, it may not be a feasible descent direction, i.e., it may not lie 
in the set 

{s I s'^g^Xk) < 0 and s'^ai(xk) < 0 for all i e A{xk)} , (11) 

since s'^ai(xk) may be zero for one or more i E A{xk)^ Thus an arbi- 
trarily small step along Sk may violate one or more of the (nonlinec^r) 
constraints active at x^. To avoid this difficulty, any feasible descent 
direction , and the “tilted” direction s'^ = {1 - pk)sk + ? foi* 

some pk G (0, 1), are determined. The tilted direction is itself a feasible 
descent direction, and, so long as pk converges to zero sufficiently fast, 
retains the fast asymptotic convergence properties of the original SQP 
direction. However, since these properties only arise if an asymptotic 
unit step is taken, a second-order correction rather like from (9) may 
be necessary. The FSQP algorithm of [54] is based on these ideas, and 
can be shown to be globally convergent under suitable assumptions on 
iJfc? and the requirement that (11) is non-empty, which amounts to a 
constraint qualification. Further, Q-superlinear convergence is achieved 
under the additional assumptions A3 and A4. Although this method 
has only been considered for small problems, it is not difficult to imag- 
ine how to generalize it by taking approximate solutions to the various 
subproblems. As with most linesearch methods, the requirement that 
Hk be positive definite is its major weakness. 

3. TRUST-REGION METHODS 

The second important class of methods designed to ensure global con- 
vergence of locally convergent minimization algorithms are trust-region 
methods. Rather than controlling the step taken along the SQP direc- 
tion (having computed the direction), trust-region methods aim to con- 
trol the step at the same time as computing the search direction. Such 
methods hold a distinct advantage over linesearch methods, in that 
is not required to be positive definite. 

To simplify our discussion, consider first the unconstrained minimiza- 
tion of a nonlinear smooth function /. At the /i:-th iteration, a model 
+ s) of f{xk + s) is used. This model is merely required to re- 
semble / increasingly accurately as s approaches zero, and is believed to 
be a good approximation for all s within a trust region \\s\\k < A/^ for 
some appropriate, possibly iteration-dependent, norm 1| • ||a: and radius 
A^ > 0. If this is the case, an approximate minimizer of should 
provide a good estimate of the minimizer of / within the same region. 
The first stage of a trust-region method is thus to compute a suitable 
approximate minimizer of within the trust region. If our hypoth- 
esis is correct, we would then expect rrik{xk) — rrik{xk + Sk) to be a good 
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approximation to f{xk) — f{xk 4- s^); if this confidence is repaid, we set 
Xk+i = Xk + 5/c, and possibly increase the radius. On the other hand, 
when rukixk) - rrik{xk + Sk) and f{xk) - f{xk + Sfc) are very different, 
our hypothesis is invalid, that is to say is too large. In this case, we 
set Xf^^i = Xk^ and ensure that A/^+i significantly less than A^. This 
extremely simple framework is imbued with very powerful global conver- 
gence properties under extremely modest assumptions (see, for example, 
[19]). In practice, all that is required of the step is that it reduces the 
model by at least a fixed fraction of the reduction that can be obtained 
by approximately minimizing the model within the trust-region along a 
gradient-related direction (such as —g{xk))> This one-dimensional mini- 
mization problem is often trivial; the resulting point, the Cauchy point, 
plays a key role in the convergence theory for trust-region methods. A 
most important result is that if a Newton model — the first three terms of 
a Taylor expansion — is used, and if the model is minimized sufficiently 
accurately, the trust region constraint will become inactive asymptot- 
ically, and the resulting full Newton step will provide a Q-superlinear 
convergence rate. 

Turning now to the constrained case, it is reasonable to expect to 
replace the objective function by a suitable merit function, and to build 
a model of this merit function. However, if we try to impose a trust- 
region constraint \\s\\k < Ak on top of the linearized constraints c{xk) -f 
A{xk)s > 0, we immediately see a difficulty. Simply, if c{xk) is nonzero, 
the intersection of linearized constraints with the trust region will be 
empty if Ak is too small. Thus, the strategy outlined in the previous 
paragraph, in which the radius is reduced until the model of the merit 
function proves to be adequate, is flawed in the constrained case. 

In this section, we consider a number of ways of avoiding potential 
devastation from this discovery. 

3.1. S^i QP-LIKE APPROACHES 

This approach avoids the incompatibility issue altogether. Simply, 
rather than considering an SQP method directly, we instead aim to 
minimize the unconstrained, non-smooth penalty function (5). Since 
(5) is non-smooth, we cannot appeal directly to trust-region theory for 
smooth unconstrained minimization. However, the basic idea remains 
valid. We model (p{xk + 5, a) as 

mk{xk + s) = f{xk) + s^g(xk) + ^s'^HkS + a\\ {c{xk) + A{xk)s)_ ||, ( 12 ) 

where Hk reflects the curvature in both / and c, and aim to approxi- 
mately minimize this model within the trust region. All that is really 
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required is to change the definition of the Cauchy point, since the gra- 
dient may not exist at x^. Instead of the negative gradient, it suffices to 
consider the steepest descent direction 

d{xk) = -arg min || 5 ||, 

ged(f{xk,cT) 

where dip{xk,cr) is the generalized gradient of (p{x^a) at x^^ Because of 
the polyhedral convex structure of the non-differentiable term min(c, 0) 
in the definition of (/;, this generalized gradient may be found by solving a 
linear or convex quadratic program in the commonly occurring cases that 
use the ^i, £2 or ^00 norm. While computing the Cauchy point is thus 
undoubtedly more expensive than in the smooth case, the alternative 
of exactly minimizing (12) within the trust region is usually even more 
expensive, since the latter problem may be non-convex (depending on 
Hfi). The original idea here is due to [24] and [26], Section 14.4, who 
proposed minimizing (12) using the ii norm within an ^ 00 -norm trust 
region. The resulting so-called ^iQP subproblem can be converted to an 
ordinary QP (with a given initial feasible point) by adding additional 
variables, but is probably best solved as is. A significant advantage of 
this method over almost all of its competitors is that an independence 
assumption like A2 is not required to assure global convergence to a 
critical point of the merit function. 

Asymptotically, so long as the penalty parameter is large enough, the 
SQP and £iQP directions coincide, and thus we might expect a fast 
asymptotic convergence rate provided that the trust-region constraint is 
inactive. However, as we noted in Section 2, the SQP direction may suf- 
fer from the Maratos effect, and the same is true of the S^iQP direction. 
Thus, the S^iQP direction may not be acceptable, and consequently the 
trust-region radius will be reduced to exclude this step. The cure is 
exactly as before, namely a second-order correction should be added to 
correct for constraint curvature. In view of (9), an appropriate correction 
is obtained by minimizing 

+ ^11 {<Xk + Sk) + A{xk+Pk)s^)_ II 

within the trust region \\sk + s^\\j^ < A^. A fairly intricate algorithm, 
based on such a correction and proposed by [25], was shown by [72] to 
ensure that the trust-region radius is asymptotically inactive, and thus 
the iterates can converge Q-superlinearly under assumptions A2-A4. 
Perhaps more simply, all that is required is that the trust region radius 
is reset to at least a fixed positive value whenever a successful step is 
taken, for then the trust-region will not ultimately interfere with the next 
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step Sk and, if needed, the correction Thus either Sk or Sk + will 
ultimately be accepted, and the radius is subsequently bounded away 
from zero. Convergence to a second-order critical point — one for which 
(weak) second-order necessary conditions hold — may also be guaranteed, 
so long as significant negative curvature is exploited in the model, and 
that this negative curvature is refiected in the true problem — this is the 
case, for example, if converges to 

To date, it is unclear whether it is better to update a as the iteration 
proceeds, or to wait until a critical point of (p{x, a) has been found before 
doing so. The advantage of the former is that a sequence of problems 
will not be solved, while the disadvantage is that any automatic value 
which aims to predict the correct value may also over-estimate it, leading 
to a poorer conditioned penalty function. 

3.2. VARDI-LIKE APPROACHES 

The remaining approaches to be considered may be termed composite- 
step methods. A composite step Sk is computed as the sum of two com- 
ponents Qk and tjt, each of which has different aims. The (quasi-) normal 
component is simply intended to improve the linearized infeasibility 
as much as possible while satisfying the trust region constraint. Thus 
the merit function is ignored in this part of the computation. By con- 
trast, the tangential component tk aims not to degrade the improved 
infeasibility obtained in the normal step, while now concentrating on 
reducing a model of the merit function. 

For simplicity, we shall suppose in this and the next two sections, that 
our constraints are equations, c{x) — 0 — we shall return to the inequality 
case in Section 3.5. Recognising that the set 

^k = {q \ c{xk) + A{xk)q = 0 and ||g||fc < Aa;} 

may be empty, Vardi [66] and Byrd, Schnabel and Shultz [8] instead 
replace the linearized constraints by akc{xk) + A{xk)q — 0 for some 
0 < <a/j; < 1, where is chosen so that 

= {q I 0 !kc{xk) + A{xk)q = 0 and ||^||fc < Afc} 

is non-empty. Clearly Tk{0) is non-empty, and any value ak < <amax is 
also suitable, where amax is the greatest a in (0, 1] such that 

min \\ac{xk) + A{xk)q\\ = 0. 

As finding amax may be expensive, — it may require the computation of 
the projection q^{x}^) of the origin onto the set {q \ c{xk) -f A{xk)q = 
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0} — in practice an approximation to q^{xk) that satisfies c{xk) + 
A{xk)q^ — 0 computed instead, and subsequently found so 

that qk — OLkQk within the trust region. In fact, this last condition 
is strengthened so that q^ lies strictly within the trust region, allowing 
some “elbow room” for the subsequent tangential step. Notice, however, 
the implicit requirement that the linearized constraints be compatible, 
which is the weakest point of the whole approach. 

Having found the normal step, the tangential step is chosen to reduce a 
model of the merit function. Specifically, if we consider a merit function 
of the form 

‘f{x,a) = fix) + a\\c{x)\\, 

and model (fixk + s,a) by m^ixk + s) = rn‘^{xk + s) + am'^{xk + s), 
where 

m^ixk + s) = fixk) + s'^gixk) + ^^EkS 

and mg(a;fc + s) = ||c(a;fc) + ^(a:A:)s||, 

we see that following the normal step, m^{xk + qk) will have decreased, 
but m^{xk + qk) i^cty have increased. To cope with this, we pick the 
tangential step so that 

mU^k + Qk + tk) = mlixk + qk) and mlixk + qk + tk) < m%ixk + q,k) 

(the latter inequality being strict unless g[xk) + Hkqk = 0) by approxi- 
mately solving the problem 

minirnize f'igixk) + Hkqk) + 

subject to A{xk)t = 0, and \\t\\ < — ||^a:||- 

A suitable Cauchy point for this problem is readily available. So long 
as the linearized constraints are compatible, both normal and tangential 
steps satisfying the above requirements may be computed using suitable 
conjugate-gradient methods, the accuracy required being measured by 
suitable measures of the violation of the criticality conditions for the 
underlying problem. 

Note that there is no a priori guarantee that the separate choices of qk 
and tk provide mk{xk -\-qk + ik) < 'ff^ki^k)- However, since mf.{xk + ^/c + 
tk) < m^{xk), one way of ensuring that the model of the merit function 
does decrease is to increase a if necessary as the iteration proceeds. A 
simple rule is to increase the parameter to ensure that 



rrikixk) - rukixk + qk + tk) > raimlixk) - mlixk + qk + tk)), 

where the value re (0, 1) is arbitrary but preferably very small. These, 
then, are the essential ingredients in the algorithm, which otherwise 
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follows the standard trust-region paradigm. Any limit point of such an 
algorithm can be shown to be first-order critical. Moreover, the penalty 
parameter cannot grow arbitrarily large. 

Of course, as usual in methods based upon the non-smooth merit 
function (/?, it does not follow automatically that a desirable rate of 
convergence occurs. The cure, as alway, is to include a second-order 
correction satisfying (9), when needed. A suitable rule is to consider 
a second-order correction only when the normal step Qk lies well within 
the trust region, — by implication this step is feasible for the linearized 
constraints — and when the original step Sk ~ qk does not provide 
a sufficient reduction in the merit function. In fact, the actual form 
of second-order correction required depends on the form of Sk- If Sk is 
the standard SQP step (3), then any second-order correction for which 
(10) holds is permitted. On the other hand, if the standard SQP step 
lies outside the trust-region, a specific second-order correction for which 
= 0 is advised — since the first-order criticality conditions for the 
model are not satisfied, there is little sense in trying to correct for them, 
but it is still important to try to correct for constraint curvature. It can 
then be shown that with the usual assumptions A2-A4, the algorithm 
sketched above converges Q-superlinearly so long as the SQP step is 
attempted (asymptotically) whenever possible, so long as the second- 
order correction is discarded if it lies too far outside the trust region, 
and so long as the trust-region radius is not reduced when the SQP step 
is acceptable but has a “small” component The same conditions 
suffice to ensure convergence to at least one second-order critical point 
under assumption A2 if = H{xk^yk) yk converges to the 
corresponding y*. 

3-3. BYRD-OMOJOKUN-LIKE 
APPROACHES 

A different composite-step approach is due to (Byrd and) Omojokun 
[6, 51]. It forms the basis of the NITRO, ETR and BECTR algorithms 
of [7, 43, 55], respectively. This approach has a major advantage over 
that in the previous section in that there is no requirement that the 
linearized constraints be compatible, but is otherwise quite similar. 

The major, and essentially only, difference is in the computation of 
the (quasi-) normal step. Rather than shifting the linearized constraint, 
another possibility is to compute qk to approximately 



minimize \\c{xk) P A{xk)q\\ subject to (13) 
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for some 0 < <^^ < 1. This problem may have a large number of 
solutions — the minimum-norm solution will give a component which is 
normal to Since computing an exact solution may be expensive, a 
cheaper option is to find a q that gives a reduction in \\c{xk) + A{x,}^)q\\ 
that is at least a fraction of that achievable at a suitable Cauchy point 
for this problem, such a point being 



Qk = -akA^{xk)c{xk), ^ 4 ) 



where 



As before, such a requirement is satisfied at the first iteration of a 
suitable conjugate-gradient method, and subsequent conjugate-gradient 
steps may be used to further reduce the violation. From a theoretical 
point of view, the normal step needs to have a non-trivial component in 
the above-mentioned minimum-norm solution. 

The resulting algorithm offers essentially the same guarantees as its 
predecessor. So long as A{xk) is of full rank, it follows that 
A^{xk)c{xk) = 0, indicating that, at worst, limit points are locally least 
infeasible. If the limiting Jacobian is also of full rank, we deduce not 
only lim^^oo = 0, but also that the remaining first-order criticality 
conditions hold, and the penalty parameter remains finite. 

Turning to the issue of fast convergence, essentially the same precau- 
tions as before may be used. Since we are ultimately interested in using a 
full SQP step, we shall require that eventually either the normal step lies 
on the “shrunken” trust-region boundary, i.e., or that a 

step that satisfies the linearized constraints, and lies within the shrunken 
trust-region, is possible, i.e., c{xk) + A{xk)qk = 0 and ||g'^|| < In 

the latter case, if the standard SQP step (3) does not provide a sufficient 
reduction in the merit function, a second-order correction is attempted. 
As before, the exact form depends upon whether the SQP step satisfies 
the trust-region constraint, in which case a general correction is allowed, 
or if the SQP step lies outside, in which case a restricted correction in 
which = 0 is used. The resulting algorithm then converges at a Q- 
superlinear rate under exactly the same conditions as its predecessor, 
and at least one limit point x^ is second-order critical if additionally 
= H{xk^yk) and yk converges to the corresponding y*. 
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3.4. CELIS-DENNIS-TAPIA-LIKE 
APPROACHES 

A third way of dealing with the possibility that the linearized con- 
straints and the trust region have no common feasible point is to replace 
the former by 

\\c{xk) + A{xk)s\\ < 6k, (15) 

where Ok is chosen so that (15) and the trust region bound ||s||fe < 
can both be satisfied by some s. Clearly, since we wish to reduce the 
infeasibility, we should insist at the very least that 

min \\c{xk) + A{xk)s\\ <0k< ||c(xfe)||, (16) 

l|s|U<A/fc 

while another possibility is to require 

min \\c{xk) + A(a;fc)s|| < Ok < nun \\c{xk) + A(xfc)s||, (17) 

I|s|U<6Aa: l|s|U<?2Afc 

where 0 < ^2 < < 1- Since solving problems of the form 

min \\c{xk) + A{xk)s\\ 

Mk<^^k 

for some 0 < ^ < 1, which are needed to ensure that 9 satisfies (16) 
or (17), may be expensive, a cheaper possibility is to find any step q 
which lies within the trust region but which also significantly reduces 
W^i^k) + A{xk)q\\. The most popular choice is, of course, the Cauchy 
step (14), but any step which further decreases \\c{xk) + A{xk)q\\ is also 
possible. 

Although the computation of a suitable shift q to reduce the infea- 
sibility is reminiscent of the composite step methods considered, qk is 
actually only used to find 

Ok = \\c{xk) + A{xk)qk\\ < ||c(o;fc)||, (18) 

where the inequality in (18) is strict unless c{xj^) = 0. The overall step 
is computed as an approximate solution to the problem 

minimize g{xk) + (19a) 

subject to \\c{xk) + A(a;yt)5|| < 9k and ||5|| < H^k^ (19b) 

for some appropriate Lagrange multiplier estimates yk and approxima- 
tion, Hk^ to the Hessian of the Lagrangian. Methods based on these 
suggestions have been proposed by Celis, Dennis and Tapia [10] and 
Powell and Yuan [58]. Notice that by considering the whole feasible 
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region (19b) rather than successive normal and tangential components, 
there is potential for greater reductions in the objective function (19a) 
than with the previous two approaches. Unfortunately, this advantage 
may also be regarded as its Achilles’ heel. 

The main disadvantage of these approaches is apparent if one con- 
siders (19). If polyhedral norms are used, this subproblem reduces to 
a (possibly non-convex) inequality-constrdimed quadratic program which 
may prove rather expensive to solve. On the other hand, if we choose 
the £2 norm, the subproblem involves two quadratic constraints. Thus 
it is unclear if or how the powerful techniques which have been devel- 
oped for the simpler subproblem involving a single quadratic constraint 
(see, for example, [36, 48]) may or can be applied. In particular, it is 
far from evident how to compute the model minimizer, nor is it obvious 
how to derive a useful approximation — some results for convex models 
have been obtained by [39, 73] and others. Indeed, given that global and 
local convergence theories matching those of the other methods we have 
considered in this section can be developed, we can only surmise that 
the lack of any reported implementation based on the approach taken 
here may be attributed to this disadvantage. We mention in passing that 
all of the methods we are aware of that use this approach use smooth 
exact penalty functions like (6) to force global convergence, but methods 
based on (5) seem to be equally possible. 

3.5. INEQUALITY CONSTRAINTS 

We now return to the case where the constraints are inequalities, 
c{x) > 0. There are two basic approaches. The first is to extend ihe 
model problem to include inequalities. As we have already noted, it may 
be that the set 

Tk = {s \ c{xk) + A{xk)s > 0 and ||s||fc < Afc} 

is empty when is small. Thus, we may instead have to be content 
with a step which moves us towards a solution of the model problem. 
The methods we have considered in the three previous sections have 
achieved this by decomposing the step as Sk — where the (quasi-) 

normal step qk is chosen to reduce the (linearized) infeasibility, and 
the tangential step tk is then determined to reduce the model without 
worsening the infeasibility attained during the normal step. For the 
general problem, much the same approach is valid. 

There are obvious variants of all of the three main approaches we 
have discussed. Consider first the normal step. To extend the Vardi-like 
methods, sketched in Section 3.2, we need to compute a trial step q^ 
which satisfies the linear constraints c[xk) + A{xk)s > 0, and which is 
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not significantly longer than the projection onto the linearized feasible 
region. We then take a step in this direction as far, or almost as 
far, as we can within the trust region, and set 

Byrd-Omojokun-like approaches of Section 3.3, the normal step should 
be calculated by finding qk to approximately 

minimize || (c{xk) A{xk)q) _ || subject to jig'll < (20) 

qernJ^ 

for some 0 < < 1, essentially as we did in (13). Note that, when the 

problem is defined in terms of the £i, £2 or ioo norm, and if a polyhe- 
dral trust-region is used, (20) can be reformulated as a linear or convex 
quadratic program, and thus, in principle, there are effective methods for 
(approximately) solving it. Finally, to extend the Celis-Dennis-Tapia- 
like approaches of Section 3.4, we merely need the normal step to give 
us at least as much reduction in || {c{xk) + A{xk)q) _ || as a step to a 
generalized Cauchy point for this problem. 

Turning to the tangential step, extensions to both the Vardi- and 
Byrd-Omojokun-like approaches require that the step solves approxi- 
mately 

minimize F{g{xk) + H^qk) + ht^Hkt 
subject to A{xk)t > — max [c{xk) + A{xk)qk^ 0] and 

II^IU ^ A/j ll^/c II A:- 

Notice that the linearized infeasibility is made no worse, and attention 
turns instead to reducing the model value. In theory, all that is re- 
quired is that the reduction in the model at tk is a positive fraction of 
that attainable at a generalized Cauchy point, such as that proposed by 
[15]. For Celis-Dennis-Tapia-like approaches, the tangential step must 
be calculated to approximately 

minimize s'^g(xk) + hs^HkS 

subject to II {c(xk) + A(x/c)5)_ II < 9k and (21) 

Ikll < 

where 9^ = \\{c{xk) + A{xk)qk)-\\‘ As before, this approach is less at- 
tractive in practice than its predecessors as effective methods for ap- 
proximately minimizing (21) are not known. Other details extend in 
an obvious way. In particular, the same merit functions as before are 
appropriate, provided we replace every mention of c(x) by c(x)_. 

The second way of moving from the equality-constrained to the general 
problem is to handle inequalities using barrier/interior-point methods 
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(see, for example, [14, 22, 32, 33, 62, 63, 65]). That is to say, we embed 
the inequality problem within a sequence of barrier problems of the form 

minimize f{x) + b{c{x)^iik)^ 

X 

where b{c{x)^ii) is a barrier term like — M is a 

sequence of barrier parameters which converge to zero from above. For 
this class of methods, we insist on starting from a strictly feasible point 
for the inequality constraints, that is that c{x) > 0, and we require all 
subsequent iterates to remain strictly feasible for these constraints. 

A typical trust-region method for such a problem models the bar- 
rier term using either a Newton (primal) or quasi-Newton (primal-dual) 
approximation. However, since such quadratic models have little influ- 
ence in dissuading the iterates from violating one or more of inequality 
constraints, it is crucial to either adjust the shape of the trust region 
to keep the iterates feasible, or to add explicit extra constraints to the 
trust-region subproblems to do this (or both). The main difficulty when 
there are nonlinear inequality constraints present is that any additional 
constraints imposed on the trust-region subproblem may be nonlinear. 
For this reason, inequality constraints are often converted to equations 
by introducing slack variables. That is, we replace c{x) > 0 by the 
equivalent conditions 



c{x) — = 0 and x; > 0, 

and then we solve a sequence of equality constrained minimization prob- 
lems 

minimize f{x) + 6(x;,/Xjt) subject to c{x) — v — Q. 

X, V 

The advantage of this approach is that we believe that the methods given 
throughout Section 3. 2-3.4 are well-able to deal with nonlinear equality 
constraints, while any of the barrier /interior-point, affine scaling, or [13] 
algorithms are especially suited to linear, and particularly simple bound, 
constraints. Indeed, a careful combination of the Byrd-Omojokun and 
Coleman-Li approaches forms the basis of the algorithm proposed by [6] 
and implemented as NITRO by [7], while the method proposed by [71] 
(see also [68, 69, 70]) is essentially a Vardi-like primal-dual method. Both 
of these methods are reported to perform most effectively in practice. 

There are some disadvantages of adding slack variables. Firstly, v/e 
have most definitely increased the dimension of the problem. To counter 
this, it is important to realize that the dominant cost of most algorithms 
(at least when function values are inexpensive) tends to be that for the 
linear algebra. In practice, significant algebraic savings may be made 
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be recognising that slack variables only occur linearly in the problem 
reformulation, and each slack variable is associated with a single con- 
straint. The second disadvantage is that a suitable scaling of the slack 
variables is often difficult to find — in practice, it is more usual to pick 
the slacks so that c{x) —Dv = 0 and u > 0, where the diagonal matrix D 
is supposed to refiect “typical” values of c(x), but the very fact that c is 
nonlinear indicates that a uniformly good D may be hard to determine. 
This has further repercussions for trust-region methods since it is usual 
to scale the trust-region norm to account for different scalings of the 
variables. We also note that in practice the trust-region scaling needs to 
refiect the interaction between the nonlinear constraints and the simple 
bounds (see [14]). 

We conclude this short section on inequality constraints with the re- 
mark that blending good methods for coping with equality constraints 
with good ones for dealing with inequalities is an extremely active area 
of research. For this reason, we shall say no more here, but await further 
developments, and particularly comparisons of the numerous possibili- 
ties, with interest. 

3.6. FILTER METHODS 

The last method we shall consider is the youngest, and certainly one 
of the most promising. The central idea is to dispense with the idea of 
using a merit function as a means of encouraging global convergence as 
far as is practically possible, and instead to use a mechanism which is 
less likely to reject candidate iterates. One such mechanism is a so-called 
filter. 

Suppose 9{x) is some measure of the infeasibility of the constraints at 
X, for example 6{x) = ||c(a:)_||. A filter is a list of pairs {{f{xi),9{xi))}, 
with the property that no member of the filter is dominated by another, 
that is there are no two (/(x^), 9{xi)) and (/(rr^), 9{xj)) {i 7 ^ j) for which 



f{xi) < f{xj) and 9{xi) < 9{xj). 



The key point is that the filter may be used as a mechanism to accept 
or reject candidate iterates: a candidate will only be rejected if it gives 
“larger” values of both the function value and constraint violation than 
have been observed before. Contrast this to a merit function, which tries 
to combine these two (conflicting) requirements in a somewhat arbitrary 
way. An SQP-filter method aims to use the filter as a means of assessing 
iterates -f s/c, where is a suitable approximation to the solution of 
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the trust-region SQP subproblem 

minimize rrik{x}^-\-s) subject to c{xk)-\-A{xk)s > 0 and ||5||^ < 

( 22 ) 

where mk{xk + s) = f{xk) + s"^g{xf^) + The filter evolves as 

new iterates are accepted; the new iterate (or rather its (/, 6) pair) may 
be added to the filter, while the act of adding a new pair can result 
in the removal of previous members which are now dominated by the 
newcomer. 

Of course, the reader may object immediately that such a simple- 
minded approach has obvious flaws. The first is, as always, that (22) 
may not have a solution because either the trust-region radius is too 
small, or because the linearized constraints are inconsistent. The cure 
is simply to abandon temporarily the objective function, and to enter a 
restoration phase, whose sole purpose is to reduce the infeasibility 9(x). 
The end of the restoration phase is reached at Xk -P at which ei- 
ther the set {s | c{xk -f r^) + A{xk + rk)s > 0 and < A/^+i} is 

non-empty for some > 0 and for which (/(x^ + r^),0(x^ + rk)) is 

acceptable for the filter, or x^ + is a critical point for 9{x) — in ei- 
ther case, such a point may be achieved by (approximately) minimizing 
9{x). The second flaw is that it is easy to imagine a sequence of iter- 
ates each of which is barely acceptable to the (current) filter, but whose 
limit point is not critical — such a potential difficulty arises in most min- 
imization methods, and the cure as always is to require that the iterates 
provide a “sufficient” improvement in the filter. A suitable rule is that 
an acceptable iterate must satisfy 

f{xk + Sfc) < f{xj) - ld{xj) or 9{xk + Sk) < (1 - l)d{xj) 

for all Xj in the filter, where 7 G (0, 1). 

Fletcher and Leyffer [30] demonstrate that an SQP-filter method based 
on the above, and including a number of other heuristics, is most effec- 
tive in practice. The stated goal of requiring minimal interference from 
the filter is vindicated, and evidence is provided to show that other SQP 
methods (specifically the S^iQP method discussed in Section 3.1) fre- 
quently suffer more interference from their merit functions. In order to 
prove convergence of an SQP-filter method, specific rules for when to 
include an iterate in the filter, what sort of approximate step may be 
tolerated, and how to adjust the trust-region radius are required. The 
first-such convergence result (for an SLP-filter) was provided by [31], and 
this has now been extended to the SQP case by [29]. The step is com- 
puted as the composite Sk = qk + essentially as we have considered 
in the previous four sections — if an infeasible subproblem is detected 
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during the normal-step calculation, then the restoration phase is started 
straight away. (A variation in which the step Sk is computed as a whole 
is also possible, although one may have to retreat to the composite step 
under unfavourable circumstances.) Once has been computed, it is 
rejected if either it is unacceptable to the filter or if mk{xk + Sk) offers 
a “sufficient” improvement over mk{xk) but this predicted improvement 
does not translate into an actual improvement in f{x). The trust-region 
radius is reduced whenever a step is rejected. The iterate is added to 
the filter if either it leads to a restoration phase, or if it has been ac- 
cepted despite rrik{xk + Sk) not giving a “sufficient” improvement over 
mk{xk). Second-order convergence issues are still open, and are under 
investigation. 

4. QP METHODS 

Without a doubt, in our opinion, the primary reason SQP methods are 
back in the ascendant is that large-scale quadratic programming (QP) 
methods have matured considerably over the past few years. There are 
a number of reasons for this. At the start of the 1980s, the vast majority 
of QP methods (see the surveys by [26], Chapter 10, and [27], and the 
bibliography in [20]) were of the active set variety, most were specifically 
designed for convex {H positive semi-definite) or even strictly convex {H 
positive definite) problems, and few (if any) were capable of solving even 
medium size problems (for exceptions, see [28, 35]). The latter defect 
was due to two factors. Firstly, the dominant linear algebraic require- 
ments usually treated all relevant matrices and associated factorizations 
as dense — while it was easy to anticipate using sparse factorizations, 
this ruled out some of the most successful (orthogonal transformation) 
methods developed for the dense case. Secondly, as problem size in- 
creased, the number of iterations rose quite rapidly — in the worst case, 
an exponential number of changes in the active set was possible, and 
while the expected and observed behaviour did not get close to such 
dire predictions, it was a cause for concern. 

By the turn of the decade, the theoretical (polynomially bounded) 
promise of [42] interior-point linear programming (LP) approach, and 
its successors, had been shown to be realized in practice, and theo- 
retical extensions to convex QP were immediate — we note that, as in 
the LP case, only a small fraction of the methods proposed and anal- 
ysed have ever been implemented (for exceptions, see [9, 64]). Most 
of the implementations differ from their theoretical counterparts in or- 
der to obtain good practical performance, and all of them appear to 
perform considerably better than their worst-case polynomial bound. 
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It is now accepted that interior-point and active-set methods are use- 
ful alternatives, but frequently the former are the methods of choice 
when the number of variables is very large. Of course, modern £1QP 
methods often require the (approximate) solution of non-convex QPs, 
for which the above-mentioned interior-point methods cannot offer the 
same theoretical guarantees, since non-convex QP is known to be an NP- 
hard problem. Nonetheless, it is possible to construct interior-point-like 
methods, which are both globally convergent, and whose asymptotic con- 
vergence behaviour is similar to the locally convex case (see, for exam- 
ple, [14, 18, 65]). Early computational experience indicates considerable 
promise for large (say n ~ 10^) problems. 

One of the means by which methods for unconstrained minimization 
made the transition from small to large problems was the recognition 
that it is not necessary to solve the relevant model problem very ac- 
curately, at least when far from the solution. As we have indicated in 
Section 3, the same is true for SQP methods. However, at present, this 
either requires that the step is computed as a composite, in which two 
Cauchy points are determined (see Sections 3.2, 3.3, and 3.6), or as a 
single step in which an auxiliary computation may be necessary (see 
Section 3.1). As yet, the only method we are aware of that allows a 
direct truncation of the QP subproblem is the active set method of [49]. 
The subproblems in both active-set and interior-point methods may be 
solved by iterative (conjugate gradient-like) methods, although it is cru- 
cial, especially for the latter, to use suitable preconditioners. 

Finally, as to which of the two QP alternatives we suggest is appro- 
priate for SQP methods, our answer is both! To justify this, we believe 
that interior-point methods probably hold the advantage for early SQP 
iterations when the active set has far from settled down. By contrast, 
when the active set is essentially known, a few active-set iterations are 
often cheaper than applying an interior-point method, since the latter 
is difficult to ‘‘warm start”, i.e., start from a known near optimal (but 
possibly un-centred) vector of variables. Thus we contend that any new 
SQP method for large-scale nonlinear programming should have access 
to both interior-point and active-set non-convex QP algorithms. 

5. CONCLUSIONS 

In this paper, we have surveyed many of the most recent SQP algo- 
rithms for nonlinear programming. The majority of them are well-suited 
to large-scale problems, and recent numerical results (see, for exam- 
ple, [30, 71]) indicate that such methods are often considerably better 
than state-of-the-art implementations of other nonlinear programming 
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algorithms (such as MINOS and, it hurts us to say, LANCELOT). We 
should add that the tests performed are all on medium-sized problems 
(say n ~ 10^-10^), but we expect this trend to continue for large ones 
(say n ~ 10^-10^) in the near future, provided that options for solving 
core linear systems by iterative (say, preconditioned conjugate- gradient) 
methods are incorporated. It still remains to be seen if the major way in 
which modern SQP algorithms will benefit from interior-point technol- 
ogy is in the improvements these give to quadratic programming algo- 
rithms, or in the ways these suggest for handling inequality constraints. 
It also remains to be seen if new ideas, such as the SQP-filter method 
described in Section 3.6, fulfil their early promise. 
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Abstract In this paper, we study primal and dual formulations of multist;age 
stochastic programs (SP). Using a dual formulation, we discuss a de- 
composition/cutting plane algorithm that can be used to solve such 
problems. The algorithm, which is based on a scenario decomposition 
derived from the dual statement of the problem, is best viewed cis a 
conceptual algorithm. Nevertheless, it lends itself to the use of sampled 
data, and enhancements necessary to produce a computationally vieible 
method are discussed. 



1. INTRODUCTION 

Stochastic programming (SP) is a powerful modeling tool that allows 
decision-making models to incorporate uncertain parameters. One of 
the main strengths of the SP methodology is its ability to consider the 
impact of a variety of scenarios when evaluating a proposed solution, in 
contrast to the more restrictive approach of deterministic optimization 
models, in which only a single scenario is considered. Also, despite the 
large-scale nature of stochastic optimization models, several successful 
applications of SP models have been reported in the literature (see, e.g., 
Cariho et al [1994], Sen, Doverspike, and Cosares [1994]). 

As in other areas of optimization, duality has implications for both 
SP modeling as well as the development of SP algorithms (see, e.g., 
Rockafellar and Wets [1991], Higle and Sen [1996a]). In this paper, we 
present an easily accessible development of a dual problem for a multi- 
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stage SP. This accessiblity is due, in large part, to the use of stochastic 
analogs of deterministic constructs that are already well understood in 
the mathematical programming literature. Based on this setting, we 
present a decomposition/cutting plane algorithm that may be used to 
solve a multistage stochastic program. The problem is decomposed by 
‘‘scenario”, as in Rockafellar and Wets [1991], rather than by time stage, 
as in Birge [1985] and Gassmann [1990]. Within our framework, we make 
no distinctions regarding the nature of the random variables involved; 
discrete and continuous random variables are considered under a com- 
mon umbrella. 

This paper is organized as follows. In §2, we present a generic formu- 
lation of a multistage stochastic program, and suggest alternate repre- 
sentations of the so-called “nonanticipativity constraints”. These con- 
straints model the time-staged evolution of information, and are char- 
acteristic components of a stochastic program. In §3, we present a dual 
representation of a stochastic program. In §4, we propose a decomposi- 
tion/cutting plane algorithm for the solution of stochastic programs. It 
may be best to think of this as a “conceptual algorithm”, from which 
a computationally viable method can be obtained. As such, some of 
the enhancements necessary to obtain a viable algorithm are also dis- 
cussed in §4. Concluding remarks may be found in §5. While this paper 
is largely expository in nature, with most of the mathematical details 
omitted, the reader is referred to Higle and Sen [1997] for a fully de- 
tailed presentation of the duality results in §3, and to Sen, Higle and 
Rayco [1999] for a full presentation of a viable algorithm, along with 
preliminary computational results. 

2. PRIMAL PROBLEM 

In what follows, we consider a problem in which “decisions”, which 
we denote as x, and random variables are interwoven over time. An 
initial decision is made, after which an outcome of a random variable 
is observed. In response to the observation, a subsequent decision is 
made, after which another outcome is observed, etc. As a result of the 
multistage nature of the problems that we consider, our model is one 
in which both randomness and decisions evolve over time. In stage 1, 
we have the current (certain) information, denoted Information be- 
yond the first stage is uncertain, and is modeled through a sequence of 
random variables u;2, • • • We use the index t to denote a stage in 

the decision problem, t = 1, . . . ,T, whereas x = {xt} and uj — {u;^} are 
associated with decisions and outcomes, respectively. In this sense, xt 
indicates a decision made in stage t and ujt indicates an observation made 
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in stage t. The sequence of random variables Cj = is defined on 

a probability space P), and in the stochastic programming liter- 

ature, any realization u is termed as a scenario. Although we consider 
“randomness” as exogenous to the problem, so that a particular choice 
of X = does not have a distributional impact upon ci, a feasible 

choice of x is nonetheless dependent upon Cj. Thus, for each possible 
scenario cj G fi, there is a set of feasible solutions, X(cj), and an objec- 
tive function g{x^ cj) which infiuences the choice of x. When a particular 
variable, such as x is explicitly defined via the consideration of all pos- 
sible scenarios in fi, we will denote it as {a;(cj)}^^Q. Otherwise, we will 
simply denote it cis x. Finally, throughout our development, we will as- 
sume that all vectors are appropriately dimensioned. Consequently, for 
each possible scenario cj G fi, we have the following “scenario problem”, 
also known as the “wait and see problem” : 

Min g{x,u) (P^) 

s.t. X G X{oj). 

Let x(cj) denote the solution to the scenario problem (Pu;). Note that 
the “scenario solutions”, {x(cj)}^(^n, represent the result of posterior 
optimization. That is, x(cj) would be an optimal response if one knew 
for certain that scenario cj would occur. The difficulty, of course, is that 
one is rarely (if ever) graced with such knowledge, and while x(cj^) is an 
optimal response to cj\ it may be a disasterous response to cj^. Thus, a 
model that leads to a more balanced response, one whose cost under all 
scenarios has been explicitly considered, is necessary. 

To develop such a model, it is important to first note that scenarios 
that share a common history up to some point in time must implement 
the same decision at that time. That is, if we want to be able to imple- 
ment a decision at a particular time, it cannot depend on information 
that will only become available at a later time. This requirement leads to 
constraints known as the nonanticipativity constraints. If we let Af de- 
note the set of nonant icipative solutions, we may formulate a stochastic 
programming model as follows: 

Min E[g{x{Lb)^u))\ 

s.t. x{ib) G X[Cb) a.s. 

:r(cj) G M. 

We will assume throughout that the constraints, x{u)) G X{u)) have a 
property known as relatively complete recourse . That is, we assume that 
if xt{Co) appears to be feasible on the basis of all decisions and observa- 
tions made through time it cannot be rendered infeasible as a result of 
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some event that cannot be observed until some later time. With this as- 
sumption, the scenarios are only coupled through the nonanticipativity 
constraints. (This constraint qualification is common in the stochastic 
programming literature.) When formulating a stochastic program, rela- 
tively complete recourse can be incorporated through the judicious use 
of penalties for infeasibilities. 

To model the nonanticipativity requirements, let T-Lt denote the oper- 
ator that projects any multistage entity onto the space corresponding to 
the first t stages. Then 



ntto = 

reflects the evolution of scenario uj through the first t periods. Let Qt 
denote the set of possible “truncations” in period t, or 

= {^t I for some a; G 

Note that Ctr — Next, define the point-to-set map which iden- 
tifies sets of scenarios with common truncations in period t. That is. 

Note that for each t, | E forms a partition of Q. Clearly, if 

the first t components of and are identical, then E 
In this case, and are indistinguishable in period t, and nonan- 
ticipativity requires that We may use this notion 

to express the nonanticipativity constraints in terms of state variables 
where Then the state variable representation of 

the nonanticipativity constraints may be represented as: 

xt{uj) - zt{UtCo) = 0 wpl, V t. (la) 

Note that these constraints will ensure that all scenarios that share a 
common history at period t yield a common solution in period t as well. 
Alternatively, we may express the nonanticipativity constraints using 
conditional expectations. That is, we note that if Co' is defined on the 
same probability space as a), then 

Xt{Co) — E[xt{Co') I Co' E == 0 wpl^ V t (lb) 

will ensure that x{Cj) is nonanticipative. Either (la) or (lb) may be used 
to represent the nonanticipativity requirements. There are additional 
representations as well, and the choice of representation will typically 
be guided by the solution method adopted. For the purposes of our 
algorithmic presentation, we will adopt the state variable representation. 
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(la). Finally, for ease of exposition, we define the following extended 
valued function 




00 



if X E X{u) 
otherwise. 



( 2 ) 



With this function, we may now specify a multistage stochastic program 
as follows: 



Min^ E[ip{x{Co),Co)] (P) 

s.t. x{Cj) — ztiHtCj) = 0 wpl, t ==!,..., T. 

3. DUAL PROBLEM 

As with the primal problem (P), the dual problem that we study is 
valid for both continuous and discrete random variables. In this sec- 
tion, we propose a stochastic analog of conjugate dual problems. One 
of the key features that distinguish these duals from their determinis- 
tic counterparts is the role played by multipliers associated with the 
nonanticipativity constraints (Wets [1975]). 

Throughout our development, we will introduce a number of “vari- 
ables” which correspond to measurable functions of random variables. 
For example, or equivalently, x((D), is one such measurable 

function mapping to We assume that the functions x E are 
essentially bounded and P-measurable. Let denote the space of func- 
tions that are integrable with respect to P. Given ^ E >C^, it is convenient 
to define the following linear operation. 

^ O X = [ ^{uj)'^x{ui)'P{du). (3) 

Jn 

Maintaining an eye toward the stochastic properties of the mathematical 
program under consideration, it is interesting to note that this inner 
product is equivalently stated as 

( O X = x{Cb)\, (4) 

which we recognize as the expected value of the traditional counterpart 
from deterministic mathematical programming. Throughout our devel- 
opment, we use the representations in (3) and (4) interchangably. 
Consider the function (/p defined in (2), and its conjugate function 

= sup - ip{x,io)]. (5) 

Relying on the state variable representation of the nonanticipativity con- 
straints, the following problem 
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sup (D) 

s.t. E[^t{u;) \ Cu e = 0 wpl, t = 

is dual to (P), as indicated in the following result from Higle and Sen 
[1997]. 

Theorem 1 Let(p{'^uj), as defined in (2)^ be a convex normal integrand, 
and assume that (P) has relatively complete recourse. Let Vp and 
denote the optimal values of (P) and (D), respectively. Then 

(a) Vp> Vd- 

(b) If (P) posseses an optimal solution denoted {x, z), and d<p{x{uj),u)) 
is nonempty (wpl), then there exists f{Cb) G difi{x{ib),uj), wpl, 
such that 



Elitiu) \uenp{ntd>')] = 0 (wpi), 

where u and u' are defined on the same sample space. Further- 
more, -E[ip'^{^{Cj),Co)] =Vd = Vp. 

Proof (See Higle and Sen [1997]). 

It is interesting to note that (D) may be construed as a stochas- 
tic conjugate dual. Dual feasibility requires only that the conditional 
means of the dual vectors be zero at each decision point. For notational 
convenience, we let E denote the set of dual feasible solutions. In the 
following section, we use this dual statement of the multistage stochastic 
programming problem to develop a cutting plane approximation of the 
problem. 



4. A CUTTING PLANE ALGORITHM 

Our algorithmic development depends on the dual representation, (D). 
Assuming the existence of an optimal solution, we restate the problem 
as 



Max^^£i F;[Min3,(^){(/p(x(a;),a)) - ^'^(ci)x(a;)}] 

s.t. E[<^t(o)) I cj G — 0 wpl, t = l,...,r 

Notice that in this representation, the scenarios are linked via the outer 
maximization, while the inner minimization is separable by scenario. As 
such, the inner minimization can be decomposed by scenario, with the 
outer maximization providing the coordination necessary to identify an 
optimal solution. 
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To begin, let 

x} < (p{x,uj) — ^^x V x,u; 

X 

with equality holding if x G argmin^{(/;(a;,cj) — Thus, given ^(cD) 

and x(o;), we have 

E[v{^{u)),lv)] < E[(p{x{u),u>)] - i;[^(a;)'^x(cj)] = E[p{x{u),(b)] -^ox 

with equality holding if x{Cj) G argmin 3 ,(^){(/;(x(u;), cj) — ^(cj)'^x(u;)}, 
with probability one. Note that E[(f{x{Cb)^uj)] is constant with respect 
to Thus, this affine bound provides the basis for a cutting plane 
approximation to the expected value of the inner minimization. 

The Basic Scenario Decomposition method may be stated as follows: 

Step 0. Initialize £ > 0, an error tolerance, is given. 

k ^ 0^ ^k{^) ^ 0 for all lo e = Too. 

Step 1. Solve the Subproblem A: 4- A; + 1. 

Let x^{u) G argmin^{(/;(x,ct;) — wpl. 

Step 2. Update the Master Problem Let = E[(p{x^{u;),uj)]^ 
and add the constraint v < ^ o x^ to the master 

problem. 

Step 3. Solve the Master Problem Let denote a 

solution to: 

Max V 

s.t. V < ^ o x^ j = 1, . . . , A: 

If o x^ — < e, then stop. 

Otherwise, continue from Step 1. 

One recognizes that the Basic Scenario Decomposition algorithm is Kel- 
ley’s cutting plane method applied to the dual statement of the multi- 
stage stochastic program. As such, it follows that the iterates 
converge to an optimal solution. We note, however, that there are several 
impediments to the practical use of this method. 

By way of example, note that in Step 1 of the method, we require 
that x^{Cb) be an optimal solution with probablity one. If a) is a discrete 
random variable, this requires the solution of the indicated problem for 




186 Julia L. Higle and Suvrajeet Sen 



all possible scenarios. Otherwise, it is necessary to obtain an optimal so- 
lution for all scenarios outside a set of probablity measure zero. Clearly, 
as the number of possible scenarios increases, each execution of Step 1 
becomes a substantial task. Related to this is the fact in Step 3 of the 
method, the columns may be described as: Again, as the 

number of scenarios increases, the column dimension grows quickly out 
of hand. Additionally, as with all cutting plane methods, the prolifera- 
tion of cuts in the master program will become problematic. Because of 
these observations, as the number of scenarios increases, the ability to 
actually execute Step 3 diminishes. 

Both of these problems are related to the specification of Of course, 

when dealing with large sample spaces, there is a natural tendency to 
look toward sample-based methods. Indeed, these have been success- 
fully used in the solution of large scale two-stage stochastic programs. 
In particular. Stochastic Decomposition (SD) and its variants, which 
use a successive sampling scheme (as opposed to a fixed sample) is the 
only such method which is capable of identifying an optimal solution 
to the problem with probability one (see, for example Higle and Sen 
[1991, 1996a]). SD begins with a small sample size, and increases it 
as iterations progress. In this manner. Steps 1 and 3 may be executed 
rather quickly in the early interations, when one expects the iterates, 
(^^) to be far from an optimal solution. As iterations progress, the in- 
formation obtained via the sample becomes increasingly accurate, as do 
the resulting cutting planes and solutions. We note that safeguards are 
required to prevent cutting planes derived in the early iterations (i.e., 
when the sample-based information is most likely to be inaccurate) from 
interfering with the search for an improved solution. Within the context 
of the Scenario Decomposition method, note that the use of sampled 
observations of Vt will impact the definition of dual feasiblity. Moreover, 
as iterations progress and larger sample sizes are used, cut proliferation 
and the corresponding growth in the master problem column dimension 
becomes problematic. 

Sampled Scenario Decomposition (SSD) is an algorithmic methodol- 
ogy based on the Basic Scenario Decomposition scheme described in this 
paper. SSD is designed specifically to address the issues identified above. 
As the name indicates, it is based on the use of sampled observations of 
u. In addition, as iterations progress it incorporates a row and column 
aggregation scheme designed to control the size of the master program. 
Additional features are also included. We refer the interested reader 
to Sen, Higle and Rayco [1999] for a full description of the method, its 
properties, and preliminary computational results. 
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5. CONCLUSIONS 

In this paper, we have studied a dual approach to multi-stage stochas- 
tic programming problems. This development is closely tied to deter- 
ministic conjugate duality, with the only additional requirement in the 
stochastic case being the inclusion of constraints requiring the condi- 
tional expectation of dual multipliers to be zero. By interpreting dual 
multipliers as subgradients of the cost-to-go function, dual feasibility 
requires the natural optimality condition. This approach is not only 
elegant, but leads to algorithms in which we can avoid stage-by-stage 
recursions. The advantage of doing so is that we can address a larger 
class of problems, especially those in which the objective function is not 
separable by stage. This approach is also amenable to the use of sample- 
based algorithms, although the size of the master problems can become 
arbitrarily large. By using appropriate aggregation schemes, we are able 
to maintain approximations of reasonable size. Such computations are 
reported in a forthcoming paper. 
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Abstract The forest harvesting problem involves the construction of a schedule for 
felling the individual blocks of trees which comprise a large commercial 
plantation. A strategic model sets long-term harvesting goals in terms 
of total area to be cut each year, but fails to identify individual blocks. 
A tactical model produces a short-term schedule of actual blocks. Until 
recently, most planning by forest managers has involved these as two 
separate models, often resulting in contradictory recommendations. We 
present an integrated model, which embraces both strategic and tactical 
decisions, which can be solved by optimisation methods. 

This model achieves a detailed formulation by means of a non-stan- 
dard column generation structure. The solution algorithm solves the 
resulting relaxed linear program formulation. This is then combined 
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with constraint branching techniques to obtain the desired optimal in- 
teger solution to the integrated model. 

Numerical output from a case study involving Whangapoua Forest 
in Coromandel, New Zealand, will be used to demonstrate the perfor- 
mance level of this algorithm in terms of the quality of optimisation, the 
size of the data base and the computational time. However, the main 
emphasis will be placed on the theoretical aspects of the model and its 
solution algorithm as the approach may well be transferable to other, 
quite distinct, applications. 

1. INTRODUCTION TO A CASE STUDY 

The Coromandel Peninsula is a magnificent bush-clad region of dis- 
sected hill country fringed by beautiful sandy coves and estuaries located 
in Northern New Zealand. Although the climate is generally warm with 
enough gentle rain to promote ideal growing conditions for trees, the 
location is prone to intense rainfall events and the associated risk of 
severe mass movement. Many tourists visit the Coromandel, making 
the preservation of the natural environment a top priority. The local 
community is deeply involved in conservation issues. The Whangapoua 
Forest occupies 7365 hectares on this peninsula. The forest is operated 
by Ernslaw One Ltd. The management of this company use a rota- 
tion of about 28 years. As a consequence, a horizon of about 30 years 
is needed for the long term strategic harvest planning. The strategic 
plan contains many constraints. Typical of these are the constraints re- 
sulting from the demanding environmental restrictions which limit the 
proportion of each catchment that may be harvested over any consec- 
utive 5-year period. None of the strategic constraints concern specific 
blocks of trees. A second type of planning, called tactical planning, is 
needed for this purpose. Such detailed area-specific planning is required 
for a much shorter period of 2 to 5 years. At present the forest contains 
145 surveyed mature blocks which contain trees representing about 30 
crop types. This latter type of planning involves decisions on when each 
specific block will be felled. Each block must be felled as a single har- 
vesting operation. It is not permitted to partially cut a block and then 
return later to complete the harvest. Further environmental and opera- 
tional restrictions apply to this tactical planning. These include the so 
called cable logging adjacency requirements, which identify specific pairs 
or cliques of blocks and impose restrictions on when each member may 
be harvested in relation to the other members of the clique. 

At present, harvest planning in the Whangapoua Forest is done in the 
following manner. Linear programming software is used to get an opti- 
mal strategic plan, followed by 2 or 3 days of manual activity to convert 
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this into a feasible tactical plan for the next 2 or 3 years. The connec- 
tion between the strategic plan and the tactical plan is very tenuous, 
so the optimal aspects of the strategic plan are lost. When the process 
is repeated 1 or 2 years later a totally different optimal strategic plan 
appears! Also forest managers are worried about issues of sustainability 
and long-term block-feasibility. For these reasons the Forest Research 
Institute was approached for advice, and our team, seeing this as an 
ideal research case study, became involved from early 1994. 

2. SALIENT CHARACTERISTICS 

The following distinctive characteristics of this application impact 
substantially on the model which follows. There are distinct short and 
long term planning tasks. A great many complex constraints are present 
in both situations. Mixed integer programming is required with large 
numbers of integer variables. Many of the management decisions axe 
inter-related, such as road construction and removal of harvested logs. 
Terrain and the road network provide a partition of the forest into sig- 
nificant smaller units. There are problems associated with the sheer size 
of this application. The actual harvesting decisions concern a moderate 
number of blocks, 145, to be harvested over a small number of years, 
6. The excessive size of the application results from the combinatorial 
aspects of this situation. The model and the algorithm which follows 
may be transferable to other applications with similar characteristics. 

3- HISTORICAL SETTING 

Mathematical optimisation techniques have been applied to the solu- 
tion of forest harvesting problems for over 30 years. During this time the 
nature of the problem has evolved. The growing power of the conserva- 
tion movement has resulted in the inclusion of more and more constraints 
of greater severity and greater complexity. The physical size of planning 
tasks has greatly increased. The recent global drop in timber prices hc^s 
caused minimal profit margins and a compelling need to attain optimal, 
or near-optimal solutions if a competitive edge is to be maintained. 

It is helpful to consider the literature in three main categories. First 
consider the optimisation stream. This includes basic papers apply- 
ing linear programming to the strategic plan, such as that by Manley, 
Threadgill and Wakelin [9]. Realising that the tactical plan also had to 
be dealt with, attempts were made to extend a linear programme into a 
mixed integer programme. One of the best of these was that by Kirby, 
Hager and Wong [8], who developed the integrated resources planning 
model, (IRPM). This is a mixed integer formulation of an integrated 
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forest harvesting model. The most singular achievement of IRPM is 
that it does find the optimal solution. Other researchers recognised the 
quality of Kirby’s work but noted the limitations of IRPM to only small 
applications. O’Hara, Faaland and Bare [15] report IRPM can only deal 
with at most 50 integer variables along with a matrix of 250 rows and 
250 columns. 

One positive development has been to attempt to reduce the number 
of constraints by constraint aggregation. A number of papers such as 
those by Meneghin et al. [10], and Murray and Church [12] indicate that 
currently this is an approach widely regarded as promising. The present 
model includes constraint aggregation techniques in certain places. 

When it became apparent that there were extreme difficulties in con- 
structing an integrated model capable of the optimisation of an operatio- 
nal-sized application, a number of publications appeared advocating a 
hierarchical solution. These include Hof and Baltic [4], Weintraub and 
Cholaky [21], and Hof [5]. This approach involves first optimising the 
strategic plan, and then incorporating this strategic solution as a con- 
straint in a separate optimisation of the tactical plan. Although the 
optimisation attained in this way will always be inferior to that ob- 
tained from an integrated model, the quality of the solution may be 
quite acceptable provided there are not too many constraints applica- 
ble to the tactical plan. Such an approach contains the assumption 
that, due to the aggregated nature of the variables used in the optimal 
strategic solution, a feasible solution will always be obtainable to the 
constrained tactical plan. In practice, this is almost never true, espe- 
cially as the present trend continues for governments to impose ever more 
harsh area-sensitive environmental constraints on the tactical plan. As a 
consequence, the implementation of hierarchical solution algorithms has 
generally been accompanied by an observable deterioration in objective 
value between the strategic optimal solution and the final area-sensitive 
tactical solution. Daust and Nelson [1] have recorded this as a loss of 
between 4 percent and 20 percent. Such findings question the credibility 
of the optimisation process in these hierarchical models. An integrated 
model will avoid this problem. 

The second category of forest harvesting research may be termed the 
heuristic stream. Here attention is focused on the integer variables, 
with many of these models relating exclusively to the tactical plan. A 
representative paper here would be that by O’Hara, Faaland and Bare 
[15]. They use a Monte Carlo method in which a subtle weighting is 
introduced to facilitate random generation of good quality solutions. The 
performance achieved is the heuristic solution, not necessarily optimal. 
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of a problem containing 242 blocks, with a tactical horizon of 3 years, 
in 15 minutes on a VAX 8700. 

As one could well anticipate, this stream has indulged in ever more 
complicated heuristic methods. For example, a recent paper by Yoshi- 
moto, Brodie and Sessions [24] proposes an intricate heuristic procedure 
in which the forest is partitioned into numerous sub-problems. Each 
subproblem is solved independently as a regional optimisation problem. 
A measure of the performance of this method is given by the solution of 
an application involving 10 time periods in 2 hours on a 486 PC. 

Weintraub et al. [23] present a heuristic method they term heuristic 
integer planning^ (HIP). This is based on the IRPM model of Kirby [8]. 
Kirby’s optimisation problem has the disadvantage of producing output 
at the relaxed linear programme stage in which many of the integer vari- 
ables have fractional values. HIP is a method that assigns some of these 
variables to value 0 or 1. The problem is then re-optimised and more 
fractional variables are assigned integer values until an integer solution 
is obtained. The process is classified as heuristic because the assignment 
process is heuristic, being the enforcement of a set of rules. These rules 
may be either the outcome of the user’s intuition or the trend observed 
from numerical trials. As the integer solution is obtained after 7 to 12 
iterations it is clear that many variables must be assigned integer values 
every iteration. This model is capable of solving an application involv- 
ing 44 road segments, and 28 blocks over a 3-period tactical horizon. A 
total of 190 integer variables is involved with the complete matrix con- 
taining 638 rows and 1071 columns. The solution process takes about 
20 minutes. 

A comparison such as that by Nelson and Brodie [13] of a heuristic 
with an optimisation method is very insightful. In this paper a foresst 
harvesting application involving 45 blocks, 52 road segments and 3 time 
periods, that is 291 integer variables, is first solved to optimality by a 
mixed integer programme algorithm. Using two computers, a 80386PC 
and a hyper-LINDO PC this is achieved in 60 hours. The same appli- 
cation is then solved by a Monte-Carlo integer programming (MCIP) 
heuristic in 9 hours, but the best objective value obtained is 97 percent 
of the optimal value obtained from the optimisation method. 

Sessions and Sessions [17] have developed heuristic software that pro- 
duces a tactical plan without any reference to the strategic plan. The 
tactical model is formulated and solved first without including adjacency 
constraints. Monte Carlo methods are used to generate large numbers of 
candidate solutions that are then checked against adjacency constraints 
to determine which, if any, are feasible. The resulting software is named 
SNAP II. John Sessions has collaborated with Nelson and Brodie [14] 
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to investigate techniques to use SNAP II in association with the linear 
programming strategic software FORPLAN. Since these two packages 
are quite distinct an integrated optimisation is extremely difficult. 

The third and final division consists of multiple-stage methods. These 
also deal primarily with the tactical plan and the difficulties relating to 
the integer variables contained in it. A number of recent papers rec- 
ommend methods that consist of several stages, with different methods 
at each stage. One of the best of these would be that by Weintraub, 
Barahona and Epstein, [22] who suggest a column-generating technique 
to solve a forest harvesting model. Unfortunately, even after some major 
simplification assumptions, the formulation leads to a series of very diffi- 
cult and complex sub-problems for which an optimal solution cannot be 
attained. Probably the most serious weakness of this type of approach is 
the extreme difficulty of driving a multi-stage method close to optimal- 
ity. The task of trying to establish a sound theoretical foundation for 
such a structure is daunting. Weintraub et al report successful imple- 
mentation of their algorithm on a number of operational Chilean forests, 
but unfortunately do not include the computing times involved. 

Another significant multiple-stage model is that of Hoganson and 
Borges [7], who apply dynamic programming techniques to a multitude 
of subproblems. Other recent formulations include those of Snyder and 
ReVelle [19, 20]. These contain interesting mathematical insights, but 
have the weakness that they are presented in the context of hypothet- 
ical integerised grid data. Hof, Bevers and Pickens [6] utilise a similar 
digitised grid, but they show how this is derived from a real map of 
an actual forest. Naturally their grid cells do not correspond to actual 
cutting blocks, a factor that must cause significant difficulties when any 
adjacency constraints are needed. Hochbaum and Pathria [3] also use a 
grid structure for their model. It is noteworthy that they attempt to ne- 
gotiate these difficulties involving adjacency constraints, by penalising, 
rather than prohibiting, infringements of adjacency restrictions. 

Sherali, Adams and Driscoll [18] present a helpful generalised theo- 
retical model which stresses the importance of attaining a tight linear 
programming relaxation, which curiously is a feature of the model to be 
presented below. 

The most salient aspect of this substantial body of research literature 
is its extreme diversity. This indicates that despite much excellence 
and insight, not one of the many and varied solution methods has won 
any degree of wide-spread acceptance. This is a strong indicator that 
the forest harvesting problem is at present unsolved in an optimisation 
sense. 
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One extremely important related problem is that of scheduling the 
flights of airline crews into tours of duty, along with the related problem 
of rostering crews to these tours of duty. In simple terms, the models in- 
volved are huge mixed integer structures. Each possible roster for a crew 
member is represented by a column. Binary variables are used to deter- 
mine which crew member is assigned to which roster. When compared 
with forest harvesting problems there are fundamental differences. The 
crew scheduling problem belongs to the class known as set-partitioning 
problems, whereas forest harvesting problems do not. However, the 
important factor is that recent major advances have occurred in the 
technology of solving set-partitioning problems. These involve column 
generation in association with constraint branching. Desrochers et al [2] 
and Ryan [16] have developed these methods so as to obtain optimisa- 
tion solutions to applications previously thought too large for anything 
other than a heuristic treatment. As a consequence, very deflnite and 
signiflcant practical improvements are now available to airline manage- 
ment. The solution algorithm of McNaughton [11] applied constraint 
branching to the forest harvesting problem for the first time. 

4. MAIN FEATURES OF THE PROPOSED 
MODEL 

An integrated model has been chosen so as to get maximum benefits 
from optimisation, and to avoid conflict between tactical and strategic 
plans. This means that the strategic part of the plan, which involves 
continuous variables, will be joined by linking constraints to the tactical 
part of the plan which contains binary integer variables. An objective 
function will be used which represents the present net worth of the for- 
est, with revenue and costs from planned future harvests discounted 
appropriately. It is signiflcant that the revenue part of the objective is 
associated with the strategic part of the model, while the costs are mostly 
generated by the tactical part. The solution will then be obtained by 
a single optimisation process acting on the whole model. We will allow 
30 years for the strategic plan and 6 years for the tactical plan, using a 
time period of 1 year. The strategic part of the model will contain about 
1820 continuous variables and about 1120 constraints. The tactical part 
of the model will contain about 1560 binary integer variables and about 
1100 constraints. Although it is not possible to list every one of these 
constraints in detail, the most signiflcant of them will be displayed and 
explained in Section 5. Road decision variables will be included in the 
tactical plan. Many more binary integer variables will be introduced 
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by column generation during the solution process as will be explained 
below. 

5. SOME MODEL DETAIL 

The variables in the strategic part of the model, Xcet^ are continuous, 
with 

Xcet = hectares of croptype c established in year e harvested in year t. 

These are used to construct constraints in the strategic plan, which 
has the structure of a linear programme. For example, there may be a 
requirement that no more than 200 hectares be felled in a year. This 
would involve the maximum area constraint 

J2xcet< 200. ( 1 ) 

ce 

This constraint, as with the others which follow, is representative of a 
large block of constraints. In this case a separate constraint is required 
for each establishment year for each croptype in the forest. The other 
constraints in the strategic part of the model are mostly of equally simple 
formulation. These concern matters such as acheiving a non-declining 
yield and ensuring that no more of a given croptype is harvested than 
is currently available. Since the structure of this part is that of a linear 
programme, it poses no significant difficulties in the solution process, 
other than those associated with its size, comprising as it does of many 
hundreds of constraints. 

For the tactical part of the model the formulation is much more com- 
plex. The following key concept is of pivotal importance in the model 
as it provides the definitions for the principle decision variables. A road 
harvest plan is a set of tactical decisions all pertaining to the harvesting 
of blocks on one given road. These may span the entire tactical horizon. 
Only one road harvest plan will be permitted to be operating in the so- 
lution for each road, although many alternative road harvest plans may 
be generated in the model. Here is an example of a road harvest plan: 

On road 14, cut block 2 in year 3, block 3 in year 2, and block 5 in year 6. 

In the linear programming matrix representation of the model, a unique 
column with an associated integer variable will be associated with each 
road harvest plan. In this case the variable gjtn is used. 

{ 1 if the n-th plan on road j is chosen, 
harvesting starting in year t, 

0 if the n-th plan on road j is not chosen. 
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For each road, one of these variables will represent a null harvest plan 
in which no harvesting occurs during the planning horizon. 

Other columns, also with associated integer variables, represent road 
construction tasks. In this case the variable rjt is used. 

_ r 1 if road j is constructed in year 

I 0 if road j is not constructed in year t. 

The structure of the integrated model ensures that an optimal integer 
solution to the associated road problem is automatically produced with- 
out any explicit manipulation of these parts of the matrix, provided an 
optimal integer solution is obtained for the road harvest plan variables. 

Two of the tactical constraints have a pivotal role in the solution 

algorithm and need to be studied closely. Firstly a plan constraint is 

required for each road. This will ensure one and only one road harvest 
plan is selected for a particular road, say. These constraints are of 
the form: 

( 2 ) 

t n 

Secondly, road construction constraints are used to allow harvesting on 
road jo only after this road has been constructed. If n* represents the 
null harvest plan, then these are of the form: 

t 

~ ^ '^joi + X/ 

n^n* 



Other types of tactical constraints are of interest too. Road sequen- 
tial constraints are required to ensure that a continuous access route is 
available from the harvesting site to the timber mill. Suppose either 
road ji or j 2 is needed to provide access to road js. Then the necessary 
constraints will be of the form 

t 

- S + ^j2t) + rjst < 0. (4) 

t=l 

The construction of the linking constraints is of major significance. 
Recall that the model consists of two approximately equal parts, one 
generating revenue, the other cost. They involve quite different variables 
and are completely disjoint apart from the linking constraints. If these 
are used to impose a strict equality between the croptype areas, desig- 
nated by the continuous variables, Xcet^ in the strategic part, and the 
areas of the appropriate specific blocks chosen for harvest, represented 
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by expressions involving the integer variables, gjtn^ in the tactical part, 
then the model becomes too stiff. In this case the performance of the 
solution algorithm is greatly impaired. On the other hand, if slack vari- 
ables are introduced the solution algorithm runs fast but the integrity 
of the optimisation is violated. Instead new continuous variables, Scet^ 
called overlap variables, are introduced. 

Scet — the area of croptype c established in year e harvested 

in year t which is located in blocks where harvesting 
will commence in year t. 

Let the constant Ocejk represent the area of croptype c established 
in year e located in block k on road j. Let Gjtk be the set of all gjtn 
which involve the harvesting of block k on road j in year t. The linking 
constraints will then be of the form 

~^cet ~ ^cet "T 1] "h ^ ^ ^cejk ^ ^ 9 jin ~ (^) 

3 ^ t k 

The optimisation process, in association with the discounted nature of 
the objective and the requirement that the tactical variables be integer, 
will ensure that only a very few of these overlap variables, Sceti non- 
zero in the final solution. 

Finally there will be many adjacency constraints which record restric- 
tions on the permitted harvesting time for a block in relation to that of 
a given neighbouring block. As an example suppose that for technical 
reasons the logging of block a on road ja required the logging of block b 
on road within at most 2 years. This occurs if a harvesting technique 
called cable logging is being used and certain local terrain conditions ap- 
ply. Let the tactical horizon be 3 years. Then the necessary adjacency 
constraint would be 

“1 ^ ^ ^ gjtn ~~ 3 ^ ^ 9jtn 2 ^ ^ gjtn 

^jala ^ 0 ^ 

~2 gjtn + gjtn ~ 9jtn ^ 1* 

<^ja3a 



6. SOLUTION ALGORITHM 

During the solution algorithm the difficulties associated with the in- 
teger variables are circumvented by considering the relaxed linear pro- 
gramme in which each binary variable is considered as a continuous vari- 
able, bounded to the interval [0,1]. This allows extensive use of standard 
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linear programming techniques throughout the solution process. In ad- 
dition two other processes are involved. 

The first of these is column generation. If every possible road harvest 
plan was to be represented explicitly by a column in the matrix repre- 
sentation of the linear programme, then the size of the matrix would be 
too large. So at first only a few representative columns are included. In 
the present model these correspond to elementary road harvest plans, 
6jtk^ which each concern the harvesting of just one single block in a given 
year. 

_ J 1 if block k on road j is cut in year 
1 0 otherwise. 



The variables ejtk are a subset of the more general gjtn^ Each iteration 
new columns are added to the linear programme matrix, each represent- 
ing a road harvest plan. Each new column will be a composition of 2 or 
more of these elementary road harvest plans, which is selected to improve 
the current objective value, until an optimal relaxed linear programme 
solution is obtained. In this model a non-standard type of column gen- 
eration has been used which delivers a high performance level. The role 
of the ejtk variables in this column generation will be explained in the 
next section. 

The second process is constraint branching. This is a special type 
of branch and bound. It is used to remove fractional values from the 
solution to the relaxed linear programme. It works much faster than 
traditional variable branching, but can only be used if the model has 
been constructed appropriately. 

During the implementation of the solution algorithm these two pro- 
cesses work together. First a phase of column generation is used to solve 
the relaxed linear programme. Then a suitable constraint branch is im- 
plemented. The problem is then re-optimised with a further phase of 
column generation as the new constraint branch may well have made 
certain new columns desirable. The rapidity of the column generation 
process makes this easy. If the new relaxed linear programme has an 
integer solution then we stop. Otherwise another constraint branch is 
chosen and the process continues iteratively. 



7. TECHNICAL DETAILS OF THE COLUMN 
GENERATION 

When a linear programme is being solved by the revised simplex 
method, a vector called the reduced cost is obtained. The reduced cost 
of a basic variable is always 0. So it is only necessary to define the re- 
duced cost for the non-basic variables. Let the linear programme be of 
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minimise (Fx subject to Ax <6, 0 < x. 

Let B be the submatrix of basic columns of A, cb be a vector of objective 
coefficients of the basic variables, submatrix of non-basic 

columns of A, and cn be a vector of objective coefficients of the non- 
basic variables. The reduced cost vector, rc, is defined by 

rc = cj- — Cb-B C^) 

In the revised simplex method a negative component of the reduced cost 
vector indicates that the corresponding variable may be added to the ba- 
sis without causing a deterioration of the objective value. A reduced cost 
vector which is entirely non-negative is taken as the test for optimality. 

When a column generation method is being used, the matrix A con- 
tains only a representative sample of all the many possible columns the 
application permits. In this case the reduced cost may be evaluated for 
any of the other columns which are being considered as possible entering 
columns. Once again, a negative reduced cost indicates that this new 
column may be added to matrix A, in fact to the basis in A, without 
causing a deterioration of the objective value. If it can be shown that no 
entering column with a negative reduced cost exists, then this may be 
used as a test for optimality. Unfortunately, the definition in Equation 
7 is unsuitable for use in a column generation context, as the set of all 
possible entering columns is not explicitly accessible. Also the task of 
evaluating the reduced cost for every such column would generally be 
unacceptably tedious. 

Most column generation methods deal with this problem by construct- 
ing a subproblem designed to produce an entering column with a nega- 
tive reduced cost. These subproblems are of varied types, depending on 
the nature of the application, but are generally complicated and difficult 
to solve. A good example of this has already been noted in the work of 
Weintraub et al [22]. In the present model, information obtained from 
the solution to the current linear programme is used to determine the 
minimum reduced cost of any entering column. The solution to a lin- 
ear programme includes both a value of the reduced cost variable, rc, 
associated with each variable, and also a value of the dual variable, tt, 
associated with each constraint. A vector of dual variable values may 
be defined by 



7T = CgS 



( 8 ) 
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The dual variables and the reduced cost variables are related by the 
following equation, where A = (a^y), a matrix with n rows. 



n 

~ O^ijlTi. 

i=l 



( 9 ) 



Equation 9 allows the reduced cost of any entering column to be de- 
termined. Unfortunately it requires the new column to be constructed 
explicitly before the reduced cost can be computed. This would be unsat- 
isfactory in view of the very large number of possible entering columns. 
So a refinement of Equation 9 will be derived which will allow a reduced 
cost to be found without having to construct the new column. 

Recall that the initial A matrix contains one column for each elemen- 
tary road harvest plan. Each entering column will be a composite road 
harvest plan involving the harvesting of 2 or more blocks on the same 
road. When such a composite column is formed, most entries will be a 
simple vector addition of the corresponding entries in the appropriate 
elementary columns. If this were true for all entries, then the reduced 
cost of the entering column would be merely the sum of the known re- 
duced costs of these elementary columns. However, in the case of the 
plan constraints, and any trigger constraints present such as the road 
construction constraints, the composite column is not formed as a sum, 
but rather as the maximum of the set of corresponding entries. 

Consider the column corresponding to a road harvest plan gjtn- From 
Equation 2 this column will contain an entry of 1 in the plan constraint 
associated with road j. Let the associated dual variable be ttj. Similarly, 
from Equation 3 this same column will contain an entry of 1 in each of 
the road construction constraints associated with road j from year t to 
t\ where t' is the end of the tactical planning horizon. Let the dual 
variables associated with these road constraints be tt^^. 

Now consider rc{gjtn)^ the reduced cost of a column, corresponding to 
a road harvest plan gjtn-> in which the entries for the plan constraints and 
the road construction constraints are removed. Since this truncated col- 
umn is the vector sum of the appropriate truncated elementary columns, 
^{gjtn) is the sum of the reduced costs of these truncated elementary 
columns. 



rc{gjtn) = E 

^jtk'^Qjtn 



(10) 
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Also, it follows from Equation 9 that the truncated reduced cost for an 
elementary column, fv{ejtk)^ is readily computed by 

t' 

^{ejtk) = rc{ejtk) + + T^jt- (H) 

i=t 



Let gjtn represent a composite plan with block ki harvested in year 
ti. Observe t = min{ti}. Then 



rc{9jtn) = rc{gjtn) ~ T^j - 

t=t 



(as in (11)) 



^jtj^k'^Qjtn t=t 

( t' \ t' 

E I ^^i^jtik) + ^ T^ji 1 - 

^jt^k'^djin y t=ti 



3 X] 

i=t 



(12) 



Equation 12 allows the minimum reduced cost for any entering column 
to be determined merely by scanning the known values of the reduced 
costs of the elementary columns and the dual variables of the appropri- 
ate trigger constraints. Any elementary column excluded by the active 
constraint branches is omitted from this scan. If this minimum value is 
negative, then the appropriate new column can be easily constructed and 
added to the matrix. If this minimum value is non-negative, optimality 
has been attained. 



8. TECHNICAL DETAILS OF THE 
CONSTRAINT BRANCHING 

Although Vjt and gjtn are all binary integer variables, during the so- 
lution process the problem is treated as just one large relaxed linear 
programme. This is necessary whenever column generation techniques 
are to be applied. Integer solutions are obtained by a branch and bound 
process. Until recently, this was generally done by a process called vari- 
able branching in which a single binary variable was adjusted to a value 
of 1, on the 1-branch, or 0, on the 0-branch. The use of variable branch- 
ing results in a slow convergence to an integer solution with a massive 
binary tree requiring to be searched. 

Instead of variable branching, the present algorithm uses constraint 
branching, similar to that developed by Desrochers et al [2] and Ryan 
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[16]. In the forest harvesting application this entails using the sets Gjtk 
which have already been introduced in relation to the linking constraints 
of Equation 5 and the adjacency constraints of Equation 6. For example, 
a typical branching node consists of a 0-branch 



t<T [Gjtk 



= 0 , 



and a 1-branch 



E 



9jtn 

pjtk 



- 1. 



Thus the 0-branch prevents the adoption of any road harvest plan in 
which block k is felled by year T. The 1-branch requires that one of 
these harvest plans must be chosen, without specifying exactly which of 
all the many eligible plans this will be. One compelling reason for the use 
of constraint branching is the dramatic improvement in computational 
time which results. However, this technique cannot be applied unless the 
model has been formulated in an appropriate manner. In the present 
case the necessary sets of road harvest plans, Gjtk ? have been carefully 
built into the model for this purpose. 

A most helpful enhancement of constraint branching concerns what 
may be called allied constraint branches. These involve the identification 
and automatic implimentation of any other constraints that are in some 
way logical consequences of the chosen constraint branch. For example, 
in forest harvesting an operational requirement may force a pair of ad- 
jacent blocks to be felled within two years of each other, as in Equation 
6. Let the blocks be a and b on roads ja, and js respectively. In this 
case if the branch 



E 




= 1 



is applied, then so too should be the allied branch 



E 

t<T+l 






= 1 . 



Effort spent searching out allied constraint branches is amply rewarded 
by very significant reductions in computational time. 

Another good feature of this constraint branching procedure is that it 
involves the use of only one type of decision node. It is not necessary to 
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impose separate branches using the road variables, as the model has been 
constructed in such a way that once all the sets of variables designated 
by Gjtk have been forced to integer sums then all other non-continuous 
components will automatically assume integer values. As a result of 
this the preferred direction in the branch and bound process is readily 
apparent, allowing a good quality integer solution to be attained at a 
relatively shallow depth. 

Applications which involve column generation almost always require 
the use of branch and bound to obtain an integer solution. Consequently, 
the evaluation of a column generation technique should take into account 
the associated branch and bound process. In the forest harvesting ap- 
plication, the value of the column generation would be largely wasted 
without the constraint branching. During the implementation of the 
solution algorithm both these processes work together with alternating 
phases of column generation and constraint branching as described in 
Sections 6 and 7. The quality of the resulting solution will next be 
presented. 

9. PERFORMANCE LEVELS ATTAINED IN 
CASE STUDY 

The main output from this case study includes 5 complete years of a 
detailed cutting plan, a road construction schedule to match, and a har- 
monised 30 year strategic plan. An optimal integer solution is obtained 
at a depth of about 80 nodes, after about 6 minutes computational time 
on a solaris-2.6 (spare) computer, with an objective value of 99.95 per- 
cent that of the relaxed linear programme. 

The robustness of the model has been tested by numerical trials in 
which various parameters of the problem have been increased. For ex- 
ample, the number of years in the tactical plan has been increased from 
3 to 10. Also the number and type of adjacency constraints has been 
varied. From 0 to 132 cable logging adjacency constraints of the form 
given in Equation 6 have been imposed. Also, up to 144 alternative 
green-up adjacency constraints have been used. These prevent the har- 
vesting of adjacent blocks within a green-up period of 5 years. Such 
modifications result in changes to the objective value, but in each case 
the optimal objective value remains within half of one percent of that of 
the corresponding relaxed linear programme. This indicates that this is 
a tight linear programme relaxation, as advocated by Sherali et al [18]. 

In all these trials the computational times lie between 6 and 8 minutes. 
This computational level should be compared with that of traditional 
MIP models as already reported in Section 3. Such models can only 
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be implemented for very small applications as noted by O’Hara et al 
[15]. The present Whangapoua Forest case study is far too large to 
model as a traditional MIP formulation. Such a model would require 
explicit representation of all possible road harvest plans which in this 
case would be many millions. It is significant that the present model not 
only optimises large r problems than those mentioned in Section 3, but 
it also requires much less computational time. 

10. CONCLUSIONS 

The idea of developing an integrated model has been vindicated. The 
use of column generation in association with constraint branching has 
provided the technological power to achieve this end. The advantages of 
an integrated model that addresses both tactical and strategic issues are 
immediately apparent from the results of the case study presented above. 
When these are compared with the performance levels attained by other 
research workers addressing similar problems, as outlined in Section 3, it 
is seen that significant advances have been made. High quality objective 
values have been attained. Also the solution time required is very short. 
Application size appears to present no problem, with the appropriate 
numerical trials indicating no undue escalation of either elapsed user 
time, or memory requirements, when the size of the model is increased. 

The most significant line of associated research concerns the devel- 
opment of models for other applications with characteristics similar to 
those noted in Section 2. 
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Abstract The problem of determining optimal equipment sizes and regimes of 
chemical processes that guarantee flexibility of the processes under un- 
certainty is considered in this paper. The effect of multiextremality 
of the solution is investigated. A comparative analysis of determinis- 
tic methods available in the open literature has been carried out as a 
result. All the methods require some convexity (or concavity) assump- 
tions, which are difficult to verify for practical problems. To address 
the latter, we have developed algorithms with minimal convexity (or 
concavity) requirements for this problem. 

Keywords: Flexibility Analysis, Uncertainty Analysis, Chemical Process Design, 
Nonlinear Programming 

1. INTRODUCTION 

In chemical process (CP) design some design specifications such as (a) 
safety, (b) ecological, and (c) performance specifications must always be 
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met. Often such specifications are formulated either as soft or as hard 
constraints, with the presumption that a violation of hard constraints is 
not allowed. In this paper, we confine ourselves to hard constraints only. 

The satisfaction of design specifications is complicated by the presence 
of uncertainties in design models, such as 

■ inherent inaccuracies of coefficients in the mathematical models, 

■ changes in some of the coefficients in the mathematical models 
during the CP operation (for example rate constants, heat and 
mass transfer coefficients), 

■ variations in some of the parameters (e.g. temperature, flow rates, 
species concentrations) associated with external streams during 
the CP operation. 

It is of critical importance for a safe and economic operation of the 
process to account for the uncertainties discussed above when the op- 
timal design structure, equipment sizes and regimes of the CP oper- 
ation subject to design specification are determined. Several authors 
have addressed aspects of optimal design under uncertainty in the recent 
literature (Biegler, Grossmann and Westerberg, 1997), (Halemane and 
Grossmann 1983), (Grossmann and Floudas, 1987), (Pistikopoulos and 
Grossmann, 1989), (Pistikopoulos and leraptritou, 1995). Many contri- 
butions in the operation literature, however, rely on the availability of 
probability distribution functions for characterizing parametric uncer- 
tainties. Since practicing engineers very rarely know these distributions 
(without significant guess work), we believe it is important to consider 
ways of dealing with uncertainty which do not require probability dis- 
tributions to be known. Halemane and Grossmann (1983) developed 
two formulations for analyzing the optimal design problem under un- 
certainty. The first formulation is an evaluation of the CP fiexibility 
(its ability to satisfy process specifications during the operation stage), 
which has the form 



Fiid) < 0 (1) 

where the fiexibility function Fi{d) is of the form 

Fi{d) = maxminmax^j(d, z, ^). (2) 

t£T z^Z j^J 

Here, J — {l,...,m}, d is a vector of design variables, z is a vector of 
control variables, Z is a region of admissible values of control variables. 
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and 



T = {t\t<t<t} 

is the domain for the uncertain parameters. The reduced process con- 
straints 



9 j{d,z,t) < 0, j = (3) 

are obtained from the original mathematical model 

ip{d, X, z, t) = 0 (4) 

g{d,x,z,t) < 0 (5) 



by explicitly solving for the vector of state variables x which has the same 
dimension as (p. Equations (4) are state equations (i.e. material and heat 
balance equations), while inequalities (5) are design specifications. 

Fi{d) can be represented in the form 



where 



Also 



Fi{d) = m^h{d^t) 


(6) 


h{d^t) — minmaxm(d, z, t) 
zez jeJ 


(7) 


h(d^t) = min 


(8) 


z£Z,v 


9j{d,z,t) < V, j = 





Therefore, the calculation of the flexibility function F\ (d) is reduced to 
the maximization of /i(d, t) with respect to t, Halemane and Grossmann 
(1983) have shown that, in general, h{d^t) is multiextremal and non- 
differentiable. 

The second formulation (of Grossmann et. al.) is the two- stage opti- 
mization problem under uncertainty (TSOP), which has the form 



/i = min£^[/*(d,<)] 

d^D 

Fi{d) < 0 

where D is a region of admissible values of the vector d, E[f*{d,t)] is 
the mathematical expectation of f*{d,t) with respect to t with f*(d,t) 
given by 
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f*{d,t) = min f{d,z,t) 

z£Z 

gj{d,z,t) < 0, j = 

Here f{d, z, t) is the objective function in the original optimization prob- 
lem. 

Using Gaussian quadrature (Carnahan, 1969) to approximate the mul- 
tiple integral in the objective function, one can reduce the latter problem 
to the following problem (Halemane and Grossmann, 1983) 



fi 



gj{d, z\f) 
Fi{d) 



= min 
z\deD 



E 



Wif{d, z\f) 



< 0 , j ^ 

< 0 , 



( 9 ) 



where Wi are weights, f is an approximation point, z'^ is a vector of con- 
trol variables associated with the point and Ii is a set of indices of the 
approximation points. Notice that the distribution functions which are 
required for the calculation of the mathematical expectation are often 
unknown. In this case the weights and approximation points must be 
selected using engineering insight. Similar problems arise in the design 
of other technical systems such as electrical circuits. Conceptually, the 
TSOP (9) determines the optimal design margins that guarantee satis- 
faction of process specifications under uncertainty, assuming the given 
uncertainty bounds are correct. This guarantee, however, can only be 
given if inequality Fi{d) < 0 holds for the global solution of the flexibility 
problem (6). Therefore, it is appropriate that we discuss different deter- 
ministic methods and the conditions under which the global solution of 
(6) is achieved. 

2. COMPARATIVE ANALYSIS OF 
CURRENT METHODS 



In this section, we compare current methods for estimation of pro- 
cess flexibility and optimization of chemical processes under uncertainty. 
Specifically, we limit ourselves to the following methods: (i) the method 
of Halemane and Grossmann (1983) (the HG method), (ii) the active 
constraint sets method (the ACS method of Grossmann and Floudas, 
1987 and Pistikopoulos and Grossmann, 1989), (iii) the SG method of 
Swaney and Grossmann (1985), and (iv) the upper and lower bounds 
method (the ULB method of Ostrovsky and Volin, 1994, 1997). These 
methods are fairly representative of the research effort documented in 
the open literature, and they all employ local optimization methods. 
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Let us consider the first method. Swaney and Grossmann (1985) have 
shown that the global solution of the flexibility problem in (6) is obtained 
at a vertex of the parameter set T if the following condition is met 

Condition 1: The functions gj{d^z^t) are jointly quasi-convex in z and 
d. In addition they are one-dimensional quasi-convex in t. 

We note that a function f{x) is one-dimensional quasi-convex in x if 
it is quasi-convex separately with respect to each component of x. The 
search for the maximum of h{d^t) in T can now be reduced to a search 
among the vertices of T (Halemane and Grossmann, 1983). The compu- 
tational effort of the method is, in general, proportional to the number 
of vertices of T , 2^, where r is a dimension of the vector t. Swaney and 
Grossmann (1985) introduced the flexibility index, which characterizes 
the largest uncertainty region that the design can handle for feasible 
operation. For the case w^hen each gj{d^z^t) is monotonic is t, Swaney 
and Grossmann (1985) proposed an algorithm for the flexibility index 
problem which uses a branch and bound (BB) strategy to search among 
the vertices of T. Kabatek and Swaney (1992) suggested a modiflcation, 
which permits to And non- vertex solutions; however, the procedure does 
not guarantee a global solution. 

We next consider the ACS method. Under certain conditions, the 
number of active constraints in problem (8) is equal to g + 1, where q is 
the number of control variables. Based on this, Grossmann and Floudas 
(1987) suggested a method for solving (2) as follows: 

1 Identify all potentially active constraint sets AS{k) {k is the index 
of the set) consisting of q+1 constraints. Let us denote the number 
of such sets as has- 

2 Solve the problem 

k 

u = max u 

t,z 

gj{d,z,t) = u, j e AS{k) 
for all A: = 1, ..., uas- 

3 Determine 



Fi 



maxu^ 

k 



( 10 ) 



Pistikopoulos and Grossmann (1989) generalized the method for the 
retrofit design problem. Grossmann and Floudas (1987) showed that the 
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ACS method gives the global solution of problem (2) under satisfaction 
of the following: 

Condition 2: 

1 Functions gj{d,z,t) are strictly quasi-convex in 2 : for fixed t 

2 Functions gj{d,z,t) are jointly quasi-concave in 2 : and t. 

One can show that if Condition 2.1 is not satisfied, the ACS method 
does not guarantee a local or global solution. The alternative ULB 
method (Ostrovsky, Volin et. al., 1994, 1997) uses the branch and bound 
strategy for the calculation of Fi. It seeks the maximum of h{d,t) over 
a partitioning of the region T into subregions Ti and is based on the 
inequality (Ostrovsky, Volin et. al., 1994) 

F2^ > h{d,t), yteTi, ( 11 ) 

where 

F 2 i = mmmaxmaxgj(d, z.t). 
zez jeJ teTi 

Consequently one can use F 2 i as an upper bound of h{d,t) on T^. F 2 i 
can be evaluated by solving the problem 



min u (12) 

zez,u 

u, j = l,...,m. 

The ULB method for solving the TSOP is a two-level iterative pro- 
cedure, which employs a partitioning of T. The upper level serves to 
partition T using information obtained from the lower level. At iteration 
k of the upper level the lower level is used to calculate an upper bound 
fU,{k) ^ lower bound of /. Suppose at iteration k of the upper 

level the set T is partitioned into subregions TiJ — l,...,A^/e« On 
each T[ the algorithm will automatically update a set of critical points: 

^ 2^1 ~ : 0^ E Ti,i ^ II} where // is the index set of critical points on 

the subregion Ti. The upper bound /V(^) for a new set of subregions is 
calculated by solving the problem 

fU,{k) ^ min '^Wif{d,z\f) (13) 

gj{d,z\f) < 0, ieh, j = 



F2i = 



mdiX gj{d, z,t) < 
t^T{ 
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< 0, l = l,...,Nk, (14) 

t^Ti 

where is a control variables vector corresponding to the subregion T/ 
and Ii is the set of approximation points (see (9)). The lower bound 
fL,(k) jg calculated at the lower level as 

yL.{fe) ^ mm^Wif{d,z\f) (15) 

d,z\z‘ ^ 
i6/i 

Qj{d,z\f) < 0, i e h,j = 

9j{d,z“^,t^'J) < 0, eTi,l = l,...,Nk, i = l,...,m, 

where is a vector of control variables corresponding to the point 

Now consider the upper level. At each iteration a partitioning of TJ is 
performed. The partitioning strategy strongly affects the computational 
complexity of the procedure. The simplest approach is to partition all 
subregions. However, the dimension of problems (13) and (15) will be 
very large. To alleviate this problem, we employ the following heuristic: 
at the k-th iteration, T/ (/ = 1, ..., A^^) is partitioned only if for this I the 
constraints (14) are active for at least one j, (jf = 1, ...,m). 

Next, consider the lower level. Problem (15) is a standard nonlinear 
program that can be handled by standard algorithms. Problem (13), 
however, is not a conventional NLP. Therefore, a two-level iterative pro- 
cedure is used for its solution. In the first step we solve the problem 

/2 = min ^Wif{d,z\f) (16) 

gj{d,z\f) < 0, i e h, i = 

gj{d,z^,t^‘^) < 0, j = 

Let [d*, z*] be the solution to the problem, and let n be the iteration 
counter at the lower level. We note that in problem (16), the control 
vector z^ is associated with all the critical points of T/ whereas in problem 
(15), there is a control vector z^^ for each point 

During the second step, a stopping criterion is checked and some of 
the sets of critical points are extended. For this, the following mNk 
problems are solved 

m^gj{d*,z‘*,t), I = (17) 

In the case when it is not possible to obtain an explicit expression 
for state variables as a function of the variables [d, 2 :,t], problem (17) is 
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equivalent to 

max gj{d* ,t) (18) 

x,t£Ti 

(p{d* , X , z''* , t) = 0. 

For fixed I and j, let be the solution of the problem. If the condition 

j = (19) 

is met, then the solution of problem (13) was obtained. Otherwise those 
for which conditions (19) are violated are added to the sets of critical 
points of the corresponding subregions. We will assume that a local 
optimization method (for example SQP) is used for solving problems 
(15), (16) and (17). Now we will show that the ULB method gives the 
global solution of problems (2) and (9) under the assumption of 

Condition 3 

1 Functions gj (d, z, i) are jointly quasi-convex in d and (for problem 
(2) quasi- convexity in 2 : is sufficient) 

2 Functions gj(d,z,t) are quasi-concave in t. 

3 Functions f{d,z,t) is quasi-convex in d and 2 :. 

Let us consider problem (2). The ULB method gives the global so- 
lution if the upper (see problem (13)) and lower bounds of h{d,t) are 
global solutions. Rewrite the problem (12) in the form 



F2i{d) 


— mini^ 


(20) 


z,u 


G,{d,z) 


< U 


(21) 


Gj(d,z) 


= m^gj[d,z,t) 


(22) 



By Condition 3.2, a local maximum of problem (22) coincides with 
the global maximum (Bazaraa and Shetty, 1979). One can show that if 
gj is quasi-convex in 2 : [Condition 3.1), then Gj is also quasi-convex in 
It follows that the region determined by Eqn. (21) is convex and a local 
minimum of problem (20) coincides with the global minimum (Bazaraa 
et. ah, 1993). Moreover, the local minimum of problem (8) coincides 
with its global minimum. Thus, by solving problems (20) and (8), we 
will obtain the global solutions, i.e., the valid upper and lower bounds 
for subregion T. Therefore, the branch and bound procedure must give 
the global solution. Similarly, one can show that the local minimum of 
problem (9) coincides with the global minimum if Condition 3 is satisfied. 
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If Condition 3.2 holds, which is the case in many applications, but 
Condition 3.1 is not satisfied then our procedure only guarantees an 
upper bound for Fi. Indeed, in this case we obtain a local minimum 
F2i,ioc of the problem (20) associated with problem (2) and since F2i^ioc ^ 
F 2 i > F 2 i our upper bound estimate is less tight. 

Let us now consider two scenarios for the TSOP. If, on the one hand, 
the ULB method yields a solution then this solution corresponds to the 
global solution of problem (17), since we have assumed that Condition 
3.2 is satisfied. Hence, by (19), we can guarantee the fiexibility of the 
chemical process (CP). However, since the solution is in general a local 
minimum of problem (9), the design, although fiexible may not be the 
best. 

The second scenario corresponds to the case when the ULB method 
cannot obtain a solution. This means that the following condition is met 

maxminmaxa.(d, > 0 Vd 
teT zez jeJ ^ - 

This can result from one of the following: (a) the fiexibility of the CP 
cannot be guaranteed or (b) the solution corresponds to a local minimum 
of p 2 . 

Comparison of several methods with respect to their ability to obtain a 
global solution and their computational complexity leads us to conclude 
that they all supplement each other to some extent. Subsequently we 
recommend the following: 

1 if Condition 1 is met and the dimension of the vector t is small, 
the HG method should be employed; 

2 if the dimension of t is not small and each gj is monotonic, the SG 
method is appropriate; 

3 if the number of active constraint sets in problem (8) is not large 
and Condition 2 is met, it is reasonable to use the ACS method; 
and 

4 if Conditions 1 and 2 are not met but Condition 3 is met, one 
should use the ULB method. 

Using the above methods a larger class of problems can be solved. 
However, there is an inherent drawback of all the methods discussed 
earlier. For many realistic problems, it is very difficult, if not impos- 
sible, to check the convexity (concavity) of / and gj. Even if it was 
possible to make such a check, the functions may turn out to be nei- 
ther convex nor concave. Thus the above methods cannot guarantee 
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the flexibility of most realistic CP’s. To address this issue, we will now 
consider methods with a minimal dependence on knowledge of convexity 
(concavity) properties of the functions / and gj. 

3. MODIFICATION OF THE ULB METHOD 

We have developed two modifications of the ULB method, which will 
obtain a solution of TSOP with guaranteed flexibility for the case when 
Condition 3.2 is not met. The flexibility of the CP can be guaranteed 
if we can find global solutions to problem (17). In light of this, let us 
rewrite (17) and (19) in the form 

max 3 ^j(d*, 2 :^*, t) <0 / = 1, ..., j = 1, ..., m (23) 

teTi 

These conditions are equivalent to 

< 0 j = \ft€Ti,l = l,...,Nk; (24) 

The conditions in (24) mean that for each subregion Ti{l = 1, ..., W) we 
are able to find a vector of control variables which guarantee satis- 
faction of all the constraints in (3). It should be noted that the dimen- 
sionality of problem (17) is less than the dimensionality of problem (9). 
However, finding the global maximum at each iteration is expected to be 
very computationally intensive. Subsequently, we suggest the following 
modification of the ULB method. First, let us consider the flexibility 
function Fi. In subregion Ti, we will use as upper bound for h{d,t) the 
value 



F 21 = min u (25) 

u^z^Z 

< u, j = 

tETi 

where U{gj]Ti) is a concave overestimator of gj in Ti with respect to t 
satisfying the conditions 

U(gf,Ti) > gj,yteTi. (26) 

If each gj is bilinear, then McCormick’s (1976) technique can be used 
to construct U{gj\Ti) whilst for general polynomial functions gj^ Sherali 
and Alameddine’s (1992) linearization-reformulation approach is appli- 
cable. 

Comparing problems (12) and (25) and taking into account (11), it is 
easy to obtain the following inequality 



Fii ^ ^2i > h{d, t) 
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We will assume that U{gj;Ti) satisfies the condition 

lim max[U{gj;Ti) - gj{d,z,t)] = 0, (27) 

r ( T /)^0 

where r(T/) is a measure of the size of Ti. This means that, as Ti de- 
creases in size, U{gj;Ti) gets closer to gj. Consequently F 2 i becomes a 
more accurate estimate of h{d^t). In the limit we obtain (27) that 

lim p 2 i — h{d,ti) (28) 

r(Tj)->0 

and Ti reduces to a point U. 

It is clear that the modification can be used if Condition 1 is satisfied. 
This modification is expected to be superior to the HG method since 
it does not require full enumeration. It is also expected that it will be 
superior to the SG method since it can be applied to a broader class of 
problems. 

Let us revisit the ULB method for solving the TSOP. In order to 
obtain the upper bound of the objective function, we will solve 

the problem 



/s 



9 j{d, z\f) 
m^U[gj{d, z^,t);Ti] 



= min 

d,z\z^ " 

< 0, i e hj = 

< 0, l = l,...,Nk,j = 



(29) 



and replace (17) by 



^^U[gj{d*,z^*,t);Ti] (30) 

where d*, is a solution of problem (16). Of course we obtain an upper 
bound which is worse than the upper bound from (13). It is easy to see 
that if the following condition 

m3xU[gj{d\z^\t)]Ti\ < 0 (31) 

tETi 

is met, then condition (23) is met as well. Therefore, in the ULB method, 
we only need to replace (17) (i.e. maximization of each gj) with the 
maximization of the concave overestimator of each gj (i.e. by (30)). Now 
consider the case when it is impossible to obtain an explicit expression 
for the state variable x = x{d^ z^ t). For a calculation of an upper bound 
in this case it is necessary to solve problem (18) instead of problem (17). 
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To formulate the equivalent of problem (30) we first replace each equality 
constraint in (18) with two inequality constraints which results in 

rnax gj{d* ^^*, ^) 

(f{d* ,t) < 0 

—Lp{d*,x,z^*,t) < 0 . 

Now problem (31) for calculating the upper bound will be equivalent to 
solving the problem 

max U[gj (d* , x, z^* , t)] 

x^toTi 

L[(/p(d*, X, 2:^*, t)] < 0 

L[— (/;((i*, X, ^)] < 0 

where L[(/?(d*, x, t)] is a convex lower estimator of / with respect to 
t 

Since the overestimator U{gj]Ti) tends to gj as Ti becomes infinites- 
imal, its use will not result in an increase in the number of iterative 
levels of the ULB method. Thus the computational complexity is not 
increased by the modification. We refer to the current modification as 
ULBGl. Suppose Condition 3.1 is not met for ULBGl, then it is easy 
to show that a solution of the TSOP guarantees the flexibility of the 
CP. In this case, we should note that the solution is in general a local 
minimum from the point of view of the objective function (4). 

Let us consider a second modification ULBG2 to the ULB method. 
We have noted already that if the ULB method can obtain the solution 
of the TSOP then in many cases the solution will guarantee flexibility 
of the CP. However, we need an unambiguous answer with regard to 
flexibility. Thus, after obtaining the solution, we need to solve prob- 
lem (17) using a global optimization method, specifically a branch and 
bound strategy. If condition (23) is met, then the solution guarantees 
CP flexibility. Otherwise the points at which condition (23) is vio- 
lated are added to the corresponding sets of critical points 52i* In the 
computational experiments we have done so far, this procedure involves 
only one iteration in most of the cases. 

4. COMPUTATIONAL EXPERIMENTS 

In this section the proposed algorithms are applied to the optimization 
of a flowsheet consisting of a reactor and a heat exchanger (Fig. 1 from 
Halemane and Grossmann, 1983). 

The reaction is assumed to be first-order exothermal of the type 
A B. The flow rate through the heat exchanger loop is adjusted to 
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Cao,To,Fo 




Figure 1 Flowsheet for example. 



maintain the reactor temperature below Timax and to get a minimum 
of 90% conversion. The latter is given by conv = (c^o — cai)/cao- The 
performance equations of such a system are as follows 



Reactor Material and Heat Balance 



Fq{cao - cai)/cao = Vknexp{-E/RTi)CAi (32) 
{-H)Fq{cao - CAi)/cAo = FoCp{Ti - To ) + Qhe 

Heat Exchanger Heat Balance and Design Equations 



Qhe 

Qhe 



FiCp{Ti — T 2 ) = Cp>uj{T^2 — T^i)W 
Ajj {Ti - T^2) - {T 2 - T^i) 

ln{{Ti-T^2)/{T2-T^i)} 



(33) 



where Fq^Tq^Cao are the feed flow rate {[=] kgmolh~^)^ temperature 
of the feed ([=] ^K) and the concentration of the reactant in the feed 
([H kgmolm~^), respectively; V^T-[,Cai are the values of the reactor 
volume ([=] m^), the reactor temperature ([=] ^K) and the concentra- 
tion of the reactant A in the product ([==] kgmolm~^)] H is the heat 
of the reaction {[=] kJkg~^mol~^)\ F\ is the flow rate of the recycle 
{[=] kgmolh~^); T 2 is the recycle temperature; Cp = 167 AkJkgmol~^ 
and Cp^ = A.l9kJkgmol~^ are the heat capacities of the recycle mixture 
and the cooling water, respectively. T^i-,Tu) 2 ^ W are the inlet and outlet 
temperatures and the flow rate (kgh~^) of the cooling water respectively. 
A ([=] m^) is the heat transfer area of the heat exchanger and U ([=] 
kJm~‘^h~^k~^) is the overall heat transfer coefficient. 
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In this problem the following constraints apply 



—conv + 0.9 


< 


0 


(34) 


conv — 1 


< 


0 


(35) 


(T2 -T^i) + 11.1 


< 


0 


(36) 


T 2 - 389 


< 


0 


(37) 


-T 2 + 311 


< 


0 


(38) 


T 2 -T 1 


< 


0 


(39) 


~Tw2 + T-WI 


< 


0 


(40) 


311 


<Ti < 


389 


(41) 


301 


<T^2< 


355. 


(42) 



The objective function of the original optimization problem is of the 
form 

F = 691.2F‘’-^ + 873.6A°-® + 1.76ty + 7.056Fi. (43) 

The design variables are V and A. The control variables are Ti and 
Tyj 2 . The vector of state variables is [Cai,T 2 ,Fi,IT]. The vector of 
uncertain parameters is t — [FQ,To,Twi,kR,U]. Finally, the uncertainty 
region T is given by 

Til) = [if (1 - iSti) < ij < if (1 + i5ti)] (44) 

where = (45, 333, 300, 9.8, 1635) is the nominal value of the uncertain 
parameters, 7 is the parameter to determine the size of the uncertain pa- 
rameter range, and SU = (0.1,0.02,0.03,0.1,0.1) is a deviation fraction. 
The case 7=1 was investigated by Halemane and Grossmann (1983) 
and Ostrovski and Volin (1994). 

After elimination of dependent (i.e. state) variables T 2 , F, rc us- 
ing equations (32) and (33), we obtain 



conv 

T2 



y kR c AO exp{—E/ RTi ) 

Fo + VkRCAoexp{-E / RTi) 

2[— H) Fqcotiv 2F()Cp{Ti — Tq) 
AU AU 



Substituting the expressions into the constraints (34) to (40), we ob- 
tain explicit expressions for gj with respect to the control variables and 
the uncertain parameters. These expressions contain linear, bilinear and 
trilinear terms with respect to the uncertain parameters. In Table 1, we 
give the approximation points from (Halemane and Grossmann, 1983). 
Here, TV, F, U designate nominal, lower and upper bounds. 
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Fo 


To 


Twi 


Ko 


U 
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N 


N 


N 


N 




L 


L 


L 


L 


U 




U 


U 


U 


U 


L 




u 


U 


L 


U 


L 




u 


u 


U 


L 


L 



Table 1 Approximation points: N - nominal value, L - lower bound, U - upper bound 



We solved the problem for different numbers of approximation points 
and different sizes of the uncertainty region. In Table 2 we present results 
for (a) the nominal values of the uncertain parameters and (b) different 
sizes of the uncertainty region. In (9) we used five approximation points 
from Table 1. In the last column, the number of iterations used in the 
ULB method is presented. 





7 


/i 


V 


A 


# Iter 


Optimization under Nominal 
values of Uncertain Parameters 


- 


9003.62 


5.42 


5.21 


1 


Optimization under Uncertainty 


1.0 


10670.7 


6.63 


7.77 


1 


Optimization under Uncertainty 


1.25 


11187.5 


6.97 


8.57 


2 


Optimization under Uncertainty 


1.50 


11776.5 


7.34 


9.45 


5 


Optimization under Uncertainty 


1.75 


12413.2 


7.72 


10.42 


5 



Table 2 Results for different sizes of uncertainty region 



In Table 3 we present TSOP results for different numbers {Is) of 
approximation points for 7 = 1. We solved two variants corresponding 
to five approximation points and the first three approximation points 
from Table 1. If we compare the results of nominal optimization and 
optimization under uncertainty for 7 = 1 and five approximation points, 
we conclude that it is necessary to increase the reactor volume by 25% 
and the heat exchange area by 22%. However, the bilinear and trilinear 
terms are sources of multiextremality in problem (17). As such we cannot 
guarantee flexibility of the process (i.e. Condition 3.2 cannot be met). 

In light of this we solved the problem using the ULBGl and ULBG2 
methods. The ULBGl method requires construction of concave overesti- 
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Is 


h 


V 


A 


# Iter 


Optimization under Nominal 
values of Uncertain Parameters 


- 


9003.62 


5.42 


5.21 


1 


Optimization under Uncertainty 


5 


10670.7 


6.63 


7.77 


1 


Optimization under Uncertainty 


3 


10033.7 


6.05 


7.67 


1 



Table 3 Results for 7 = 1 and different numbers of approximation points 



mators U{gj) of gj. Since the latter contain bilinear and trilinear terms, 
we must construct overestimators for these terms. For bilinear terms 
we used McCormick’s (1983) expressions for overestimators and for tri- 
linear terms we used as overestimators expressions from (Maranas and 
Floudas ,1995) (an extension of the overestimator for a bilinear term). 
We solved for {h = 5) and 7 = (1, 1.25, 1.5). Results are given in Table 
4. In addition, we solved for 7 = 1 and — (5, 3). Results are given in 
Table 5. 
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fi 
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# Iter 


Optimization under Nominal 
values of Uncertain Parameters 


- 


9003.62 


5.42 


5.21 


1 


Optimization under Uncertainty 


1.0 


10670.7 


6.63 


7.77 


1 


Optimization under Uncertainty 


1.25 


11187.5 


6.97 


8.57 


1 


Optimization under Uncertainty 


1.50 


11776.5 


7.34 


9.45 


1 


Table 4 Results with the ULBGl method 
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V 


A 


# Iter 


Optimization under Nominal 
values of Uncertain Parameters 


- 


9003.62 


5.42 


5.21 


1 


Optimization under Uncertainty 


5 


10670.7 


6.63 


7.77 


1 


Optimization under Uncertainty 


3 


10033.7 


6.05 


7.67 


1 



Table 5 Results with the ULBG2 method 



Comparing the results from Table 4 with those of Table 2 and the 
results from Table 5 with those of Table 3, show that we can ensure 
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flexibility of the process. It is interesting to note that the use of the 
overestimators does not increase the number of iterations. 

In the ULBG2 method, we must take the values of design variables 
obtained by the ULB method and use them as initial points for global 
optimization of problems (17). For calculation of an upper bound of 
the functions Qj we used the overestimators, which were constructed by 
the technique described above. We solved the TSOP problem for the 
case when five approximation points are used and 7 = (1,1.25,1.5). 
Global optimization of all the constraints for all the cases showed that 
the solutions found by the ULB method are the global maxima of gj 
with respect to the uncertain parameters. Consequently, the solutions 
obtained by the ULB method guarantee flexibility of the process. It is 
interesting to note that in all the cases the branch and bound global 
optimization method found global maxima in one iteration. 

5. DISCUSSIONS AND CONCLUSIONS 

The aim of solving the two step optimization problem consists in 
determining optimal design margins guaranteeing the preservation of 
capacity for the operation of chemical processes (flexibility of chemical 
processes) in spite of model uncertainties at the design stage, and pro- 
cess uncertainties at the operation stage. The design margins obtained 
can, however, be used in practice only if the solution of the two-stage op- 
timization problem under uncertainty (TSOP) corresponds to the global 
solution of the flexibility problem. We have therefore carried out a com- 
parative analysis of three deterministic methods of flexibility analysis 
and optimization of chemical processes under uncertainty. 

Our analysis showed that the methods discussed in this paper supple- 
ment each other to some degree. Given this range of tools, one can solve 
a wide class of optimization problems. However there is an inherent 
drawback of the methods. For many real problems, it is very difficult 
to check the convexity (concavity) of the constraint and objective func- 
tions. Even if it was possible to carry out such a check, the functions 
may turn out to be neither convex nor concave. As a result, there is a 
need to develop methods that have minimal requirements on convexity 
(concavity) of the constraint and objective functions. 

We have developed two significant modifications of the ULB method 
which permit us to obtain the solution of the TSOP guaranteeing flexibil- 
ity of CP under a single condition, namely the convexity of the constraint 
functions gj{d^z^t) in 2;. Any solution obtained by the algorithm will 
guarantee flexibility of the CP even though the solution may correspond 
to a local minimum of the TSOP. 
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Abstract A well-known differential game in the theory of differential games is the 
“homicidal chauffeur” problem which was introduced by Isaacs [7]. It is 
a pursuit-evasion game. In the paper, a variant of this problem proposed 
by Bernhard [3] is considered. The computation of level sets of the value 
function in this variant becomes difficult since holes in the “victory 
domains” of the pursuer can appear. Some results of the computation 
of level sets of the value function are presented. An explanation of 
the generation of holes is given, based on the analysis of families of 
semipermeable curves. 



1. INTRODUCTION 

A differential game where an inertial object pursues a non-inertial 
one is considered. The dynamics of the game is similar to those for the 
classical [7, 4, 11] homicidal chauffeur game of R. Isaacs. The difference 
is that the evader must apply a reduced speed (in order not to be heard 
by the pursuer) when the distance between him and the pursuer becomes 
less than a given value. The idea of such a modification was suggested in 
[3]. The pursuer minimizes the time of capture and the evader maximizes 
it. The game is over when the evader gets into a given neighborhood of 
the state of the pursuer (capture neighborhood). 

M.J.D. Powell and S. Scholtes (Eds.), System Modelling and Optimization: Methods, Theory and 
Applications. © 2000 IFIP. Published by Kluwer Academic Publishers. All rights reserved. 
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In [6], level sets of the value function for particular magnitudes of 
parameters of the problem were computed using an algorithm based 
on viability theory. The solution to the problem has a complicated 
structure: holes in the solvability set (in the victory domain) of the 
pursuer can arise, the evader being safe from the pursuer within these 
holes. 

The investigation of such complex structures of solutions is of great 
interest for viability theory and the theory of differential games. In 
the problem considered, the geometry of level sets of the value func- 
tion differs from the one that was analyzed for other problems with the 
homicidal chauffeur dynamics [7, 4, 11, 8, 15]. 

In this paper, the problem is studied using an algorithm proposed by 
the authors for computing level sets of the value function. The algorithm 
is based on the theory of differential games [9, 10]. The dependence of 
the structure of the solution on the parameters of the problem is inves- 
tigated. The computations are done in the plane because a change of 
variables can reduce the dimension of the original problem to two [7]. 
The algorithm uses specific properties of the plane and is very accurate. 
It allows one to explore some fine peculiarities of the solution. Addition- 
ally, the analysis of families of so-called semipermeable curves is used to 
explain the occurrence of holes. 

2 . STATEMENT OF THE PROBLEM 

The dynamics of the game in reduced coordinates has the form [7, 6]: 

(i) 

Xi = --J^X2 (p + Vi, X2== ~^Xi Lp-hV2-W^ \ 

where \ip\ < 1 and v E Q{x). 

Here {xi^ X 2 )' is the state vector which gives the relative position of the 
evader E with respect to the pursuer P, and and R are constants 
which define the pursuer’s velocity and the minimal radius of turn, re- 
spectively. The control of player P is ip, and the control of the evader 
E is V = {vi,V 2 Y- 

The vector v belongs to the circle Q{x) with center at the origin and 
radius = min {(x^ + s}iCe/s, where We is the maximal 

value of the velocity of player E, and s is a fixed positive number. Thus 
the radius of the constraint Q{x) on the control of player E is constant 
and equal to We outside the circle of radius s with center at the origin, 
but the radius is proportional to | x | inside this circle. 

The terminal set M is the rectangle {(xi, X 2 ) G : —3.5 < xi < 3.5, 
—0.2 < X 2 < 0}. The objective of the control ip is to minimize the 
time of attaining the terminal set M, but the objective of the control 
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V = {vi^V2Y is to maximize this time. Therefore the payoff of the game 
is the time of attaining the terminal set. 

The statement of the problem was taken from [6]. In the classical 
statement [7] of the problem, the terminal set (capture neighborhood) 
is a circle. A circle can be used in the acoustic version too. However, 
more interesting cases from the mathematical point of view arise when 
the capture neighborhood is a rectangle with its horizontal side much 
greater than its vertical side. 

The game is treated in frames of formalization from [9, 10]. We are 
interested in finding level sets 1T(T, M), T > 0. Each of them is the set 
of all initial states xq in the plane such that player P can guarantee the 
transition of the state vector to the set M within time T. 

3. THE ALGORITHM 

Here the main idea of the algorithm for computing the level sets 
W (T, M) of the value function is described. 

Let A be a time step of the backward procedure. Let the i-th level 
set of the value function, namely W(iA,M), be available. This is the 
maximal set from where the pursuer P guarantees the termination of 
the game within the time zA. On the basis of this set, we compute the 
set W ((2 + 1)A,M), consisting of all states from which player P guar- 
antees the attainment of W{iA,M) within time A. As a result of such 
computations for i = 0, 1, 2,..., we obtain the collection of embedded 
sets W(A, M) C W(2A, M)C-‘C W{iA, M) C • • • C W{T, M). 




Figure 1 Construction of the sets M) 

This is a dynamic programming method. In the theory of differential 
games, the fundamental idea of the backward construction of level sets 
was considered in works of Isaacs, Fleming, Pontryagin, Krasovskii and 
Pschenichnyi. 
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The central part of our algorithm is the notion of a front. The front 
Fi^i contains all points on the boundary of the set W{{i-\-l)A^M) with 
the property that the minimal guaranteed time of attaining the previous 
set W{iA, M) is precisely A. The side of the front in the backward time 
direction will be called negative, and the opposite side will be called 
positive, as in Figure 1. The algorithm computes a new front using 
the previous front For the first step of the backward procedure, Fq 
coincides with the usable part [7] Fq of the boundary of M. The barrier 
lines are obtained via connection of the corresponding ends of the fronts. 
In Figure 1, the lines ab and cd are barriers. 

We explain briefly how the fronts are constructed. Using the notation 
p(x) =: (— X 2 ,xi)' • /R and g = (0, — we rewrite the equations 
(1) as i + v-\-g. In the computation, each front is stored as an 

ordered collection of points, so fronts are polygonal lines. An apex of a 
polygonal line is called a point of local convexity if the angle between the 
positive sides of the adjoining links is less than tt. An apex of a front is a 
point of local concavity if the above angle is greater than tt. A cone K of 
outer (inner) normal vectors is assigned to each point of local convexity 
(concavity). At the endpoints of the front, the cone K is defined in a 
different way: one extreme ray of the cone is the outer normal vector 
to the front link, and the other extreme ray of the cone is defined by 
some special relations. For each fixed point G Fi of local convexity 
and any vector ^ E K{x^)^ the extremal controls (p°, take the values 
= dxgmm{F p{x^)ip : \<p\ < 1} and = argmaxjFt? : v G Q{x^)}, 
Similarly, for the points of local concavity, the extremal controls are 
= aigrmix{i'p{x^)(p : |(/?| < 1} and = argmin{£''i; : v G Q(x*)}. 

The extremal control of player P can switch its value from one 
extreme value to another, not only at the apexes of the front, but also at 
inner points of the front links. In the game considered, such a switching 
occurs at not more than one inner point of each front link, due to the 
linearity of the dynamics in x and p. The points where such a switching 
takes place will be called neutral. The collection of all neutral points is 
included (with the preservation of ordering) in the collection of apexes 
defining the front. Each neutral point divides the original link into two 
parts that are also considered as links of the front. 

Other additional division points on the front links may also be in- 
troduced, which take into account the dependence of the constraint on 
the control of player E on x. The cone K for a neutral or additional 
point contains only the outer normal vector at the point. Further, the 
extremal controls of the players are given by the above formulae for the 
local convexity case. 
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Using the extremal controls, one computes the extremal trajectory 
x{a) = — a{p{x^)(p^ g), a ^ (0, A], in reverse time. If the 

extremal controls are not unique at x^^ a bundle of extremal trajectories 
emanating from the point x^ is considered. 

As x^ ranges through the ends of the extremal trajectories at 
(j = A are used to form the next front Fi^i, One can divide Fi into 
regular parts so that the extremal trajectories emanating from the points 
of one part do not intersect for a G (0, A]. Thus each regular part 
generates a regular field of extremal trajectories. The ends of these 
trajectories form an ordered collection of points. Being connected, these 
points give a polygonal line, which is called the secondary arc. The 
new front Fij^i is obtained by processing the regular secondary arcs, the 
processing being reduced to the intersection of secondary arcs. 

We consider this procedure for the simple case shown in Figure 2. 
Here the front Fi consists of two regular parts [z\ - - z^] and [z^ - ‘ Zr]. 
Both parts are composed of local convexity points. The ends of the ex- 
tremal trajectories computed at cr == A give two secondary arcs, namely 
[^1^2 * * • ^s] and [^5+1 • • • as shown in the left half of the figure. The 
control of player E can be chosen for each of the points and ^54-1 so 
that the trajectories of the system (1) cannot reach the front Fi within 
time A. Therefore, the “swallow tail” is not included in the 

front Fi^i = which is drawn on the right hand side of 

Figure 2. 





Figure 2 Construction of fronts 
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Unfortunately, very often, it is not sufficient to intersect neighboring 
secondary arcs only. In Figure 3, for example, the secondary arcs ^i, S 2 
and 53 are computed sequentially, but the next front is obtained due to 
the intersection of *Si and S^. 




Figure 3 Secondary arcs: complicated case of disposition 

Thus the algorithm produces a collection of fronts. In the course 
of computations, possible self-intersections of fronts and their collisions 
with the barrier lines are processed. The details of the algorithm can be 
found in [12, 13, 14]. 

4. SEMIPERMEABLE CURVES 

In this section, the results of some analysis of families of semiperme- 
able curves in differential games with homicidal chauffeur dynamics will 
be given. Using these results, one can find the solvability sets of the 
game of kind [7]. Since the set W{T,M) converges to the solvability set 
of the corresponding game of kind as T 00 , solutions to this game 
of kind can be used for verifying the computation of the sets VF(T, M). 
The families of semipermeable curves can also be helpful for checking 
the computations of level sets of the value function within solvability 
sets. 

The families of semipermeable curves are determined from only the 
dynamics of the system and the bounds on the controls of the players. 
We explain now what semipermeable curves mean. Let 

= min max £'f(x,(p,v) 

<fe[-i,i]veQ{x) 

= max min £’f{x,Lp,v), x £ B? , 
veQ{x) (^e[-i,i] 



H{£,x) 



( 2 ) 
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be the Hamiltonian of the game. Here f{x^cp^v) — p{x)(f v g. It 
is easy to see that the function i — > H{i^ x) is convex in £ in the cones 
i'p{x) > 0 and £'p{x) < 0 for any fixed x G Fix x and consider 
£ such that H{£,x) — 0. Letting ip* = diVgmm{£'p{x)(p : (p G [—1,1]} 
and V* = dxgmsix{£'v : v G Q(x)}, it follows that £' f{x^(p* < 0 holds 
for any v G Q(x), and £' f{x^ip^v*) > 0 holds for any <p G [—1, Ij. This 
means that the direction /(x, (/?*, t?*), which is orthogonal to £, sepa- 
rates the vectograms U{v*) = {f{x,(p,v*) : p> G [—1,1]} and V{(p*) — 
{/(x,(/p*,x) : V G Q{x)} of the players P and E as in Figure 4. Such a 
direction is called semipermeable. A smooth curve is called a semiper- 
meable curve if the tangent vector at any point of this curve is a semiper- 
meable direction. 




Figure 4 Semipermeable direction 



We now describe how the families of semipermeable curves can be 
obtained. The semipermeable directions are derived from the roots of 
the equation H{£^x) = 0. We distinguish the roots ” to and the 
roots ‘‘-f” to When classifying these roots, we suppose that £ E £, 
where £ is the boundary of a convex polygon containing the origin. 
We say that £^ is a root — to -f if H{£^^x) = 0, and if H{£^x) < 0 
{H{£jx) > 0) for £ < £^ {£>£*) that are sufficiently close to £^, where 
the notation £ < £^ means that the direction of the vector £ can be 
obtained from the direction of the vector £^ using a counterclockwise 
rotation through an angle not exceeding tt. The roots — to + and the 
roots -I- to — are called roots of the first and second type, respectively. 
Due to the above mentioned property of the piecewise convexity of the 
function i?(*,x), the equation H{£^x) = 0 can have at most two roots 
of each type for any given x. 
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Let us denote the roots by The left index corresponds to the 

type of root (— to + or + to — ). The right index takes the value 1 or 2, 
and indicates whether the minimum in (2) occurs for = 1 or — 1. 

One can find the domains of the functions They have very 

simple structures for the classical formulation of the homicidal chauffeur 
problem. In this case, the constraint Q on the control of player E does 
not depend on x, so we have = tCg. 

Figure 5 shows the domains of in the classical case = 

We < Two symmetric cones with a joint apex at the origin are cut 
by polygonal approximations to circular arcs of radius R/w^^\ the 
centers of the arcs being at the points (— i?,0) and (i?, 0). The regions 
of values of x where two roots of each type exist are marked A and B in 
the figure. There is only one root of each type at the points x that are 
outside A and B. 




Figure 5 Domains of the functions when = We < 

In Figure 6, the domains of are depicted for — We > 

The digits 4, 2 and 0 state the number of roots. In this case, a 
region C (the intersection of the circles of radius with centers 

at (— i?, 0) and (i?,0)) occurs where roots do not exist. The following 
property holds true for any point x E C: for any (/? G [— 1, 1] there exists 
V e Q such that /(x, ip, v) = 0. Therefore, in the region C, player E can 
counter any control of player P, so the state remains immovable all the 
time. Further, if a point x with the above property does not belong to 
the terminal set M , then M cannot be reached from x. Regions of such 
points are called the superiority sets of player E. 

Using the forms of the domains of in the classical case, one can 

construct the domains for the case when Q depends on x. We describe 
schematically how it can be done. First note that is constant on the 
circumference of any fixed circle with center (0,0). Further, = We 
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holds outside the circle of radius s. Let f2(r) be the circumference of the 
circle of radius r with center at (0,0). Find = min{r^s}we/s. 

If then put the points x E f2(r) onto the domains of 

Figure 5 constructed for Otherwise, if 

then put these points onto the domains of Figure 6. Thus a division of 
Q{r) into arcs is obtained. The number and the type of roots are the 
same for all points of each arc. In Figure 7, the division points a, 5, c 
and d, and those symmetric to them in the left half-plane, are shown, 
fi(r) being the dotted line. In Figure 8, the division points e and /, and 
those symmetric to them, are depicted. 




Figure 1 Construction of domains of when w^‘^\r) < 




236 V.S. Patsko and V.L. Turova 



This technique is applied for every r in [0, 5 ], and identically named 
division points are connected. Thus the circle of radius s is divided into 
parts according to the kinds of roots. Outside this circle, the dividing 
lines coincide with the lines constructed for the case when Q does not 
depend on x. We use the lines of Figure 5 or Figure 6, depending on 
We < or We > w^^\ respectively. 




Figure 8 



Construction of domains of when 




Figure 9 Domains of the functions when Q depends on x and We =0.8 
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Cu 




Figure 10 Superiority sets of player E when Q depends on x and u;e = 1-8 




Figure 1 1 Superiority set of player E when Q depends on x and We = 2 

Figures 9, 10 and 11 were constructed in this way for the parameters 
R = 0.8, s = 0.75 and We = 0.8, 1.8 and 2. In Figure 9, 
the domains of the functions are shown, and also the sets that 

are analogous to A and B in Figure 5 are marked. In Figure 10, two 
symmetric superiority sets of player E arise, the upper set being denoted 
by Cjj and the lower set by Cl. If we increase We^ the sets Cjj and Cl 
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expand and form the double connected region that is denoted by C in 
Figure 11. The number of roots of the equation H{i^ x) = 0 is also given 
in Figures 10 and 11. 

Figure 12 shows a fragment of the central part of Figure 10. The lines 
that separate the domains of the functions are included. 




Figure 12 Domains of the functions in part of Figure 10 

The function is Lipschitz continuous on any closed bounded 

subset of the interior of its domain. We consider the two-dimensional 
differential equation 

dx/dt = (3) 

where II is the matrix of rotation through the angle 7t/2, the rotation 
being clockwise or counterclockwise if j = 1 or j 2, respectively. Since 
the tangent vector at each point of the trajectory defined by this equation 
is a semipermeable direction, the trajectories are semipermeable curves. 
Therefore player P can keep the state vector x on one side of the curve 
(positive side), and player E can keep x on the other (negative) side. 
Further, equation (3) specifies a family of semipermeable curves, 

such that a unique smooth semipermeable curve goes through each point 
X of the domain of ^^•^)’'^(-), the root being the normal vector to 

the curve at the point x. The notation will be used for the curves 
of the family 

The family for the values of the parameters of Figure 10 is 

depicted in Figure 13. The arrows show the direction of motion in reverse 
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Figure 13 Family of semipermeable curves for the root 



time. The families and can be obtained from A'd)d 

by reflections in the x\- and X 2 -axes. 

5. SUPERIORITY SETS 

In this section, the role of superiority sets in the appearance of holes 
within the solvability sets will be explained. As noted above, there can 
be one doubly connected superiority set C of player J5, or two simply 
connected sets Cu and Cl, or the superiority set can be empty. 

Let I? be a connected superiority set of player E, and let the objective 
of this player be to bring the state of the system to the set D. Denote by 
D* the maximal solvability set (victory domain) of player E. It follows 
from the deflnition of D* that E can bring the state of the system to 
D from any point x 6 D*, but player P can prevent the state of the 
system from approaching the set D for any point x ^ D*. Since D is 
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a superiority set of it possesses the property of ^'-stability [9, 10] (or 
viability for E [1, 6]), and the set D* is ^;-stable too. This means that 
player E can hold the trajectories of the system in D* for infinite time. 
Hence, if D* fl M = 0, then the time for achieving the terminal set M 
in the main problem is infinite for any x in D* . 

The boundary of D* is composed of smooth semipermeable curves of 
the families The joins of these curves are called “sewing points”, 

and they possess the semipermeability property [5]. In some cases, a 
part of the boundary of D* can coincide with a part of the boundary 
of D, 

Due to the simple geometry of the sets D of the problem considered, 
the sets D* can be obtained easily using the families of semipermeable 
curves. For example. Figure 14 shows the configuration of D* when 
D = Cu as in Figure 10. The sewing point of the curves and 
and the symmetric sewing point of the curves \[q the 

boundary of D. 




Figure I4 Generation of the hole D* due to the superiority set D 



Since level lines of the value function do not “penetrate” into the sets 
D* in the case D* fl M = 0, one can easily generate examples where 
holes occur in the solvability sets, using this knowledge of the geometry 
of the sets D*. 




6 . 
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COMPUTATIONAL RESULTS 



In this section, the dependence of the solution on the parameter is 
demonstrated. Other parameters of the problem are fixed and have the 
values R = 0.8 and s = 0.75. The circle Q{x) is approximated 

by a polygon. Let r be the reverse time in the backward procedure for 
the construction of fronts. The optimal time for a given state x is the 
least time r such that x € 1T(t, M). 




Figure 15 200 upper and lower fronts for We = 0.4 (every 10th front is plotted) 

In Figure 15, the initial computations for We = 0.4 are shown. The 
step A is 0.005. The usable part of the terminal set M consists of three 
segments: the upper side of M and two segments on the lower side. 
The upper fronts that occur until r = 0.29 are bounded on the left and 
right by barrier lines. At r = 0.29, these barrier lines meet the upper 
boundaries of the sets A and B (see Figure 9), so they terminate. The 
value function is discontinuous across the barrier lines. For r > 0.29, the 
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fronts begin to envelop the barrier lines, and left and right corner points 
arise. The propagation of the front beyond the barrier lines from these 
corner points is at a very low rate. An enlargement of this development 
of the fronts on the right hand side is presented in Figure 16. 




Figure 16 The structure of fronts near the barrier line 

The continuation of the computation is shown in Figure 17. The upper 
and lower fronts are calculated until r = 1.6 and r = 3.3, respectively. 
The left and right lower fronts collide at r = 1.76. Only one lower front 
remains after this collision. The greatest value of r below M occurs on 
the lower boundary of M at the point (0, —0.2). 

An enlargement of the accumulation of the lower fronts is shown in 
Figure 18. We see that the end of the front moves along the terminal 
set from the end of the usable part to the point a on the boundary of 
the set B. The accumulation of fronts begins when they approach the 
semipermeable curve emanates from the point a, as shown in 

Figure 15. The value function changes very rapidly in the accumulation 
region, but it remains continuous. 
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Figure 17 320 upper fronts and 660 lower fronts for We = 0.4 

Figure 19 presents the computational results for We = 0.95 and A = 
0.005. As in the previous example, the upper barrier lines end at some 
moment of reverse time, and the fronts begin to envelop them. The main 
difference from before is the formation of a loop where the upper fronts 
from the two sides of the figure meet. In this example, the region within 
this loop (a “lagoon”) is filled out entirely by the further development 
of the fronts, the filling out being completed at r = 1.68. 

An important feature of the lower part of Figure 19 is that the semiper- 
meable curve emanating from the point a, intersects the right bar- 
rier which is the semipermeable curve This did not happen in the 

previous example. Thus the right lower fronts are confined to the right 
side of the curve xhe time of attaining the terminal set becomes 

infinite as the fronts approach the curve pd)4. a symmetric situation 
occurs for the left lower fronts. All the fronts are computed until r = 2.4. 
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Figure 18 The accumulation of fronts near the point a 



The following facts were found experimentally. A lagoon is generated 
by the upper fronts only if We > 0.65. For We 6 [0.65,1-37), a lagoon 
occurs and is completely filled by the further development of the fronts. 
For We G [1.37,1.61], the fronts do not fill the lagoon completely. For 
We > 1.61, the lagoon disappears. 

Figure 20 presents computational results for iCg = 1.5 and A = 0.005. 
The left and right parts of the upper front meet at r = 2.855. Then 
the computation within the lagoon begins. The fronts do not penetrate 
the set which is a hole inside the solvability set of player P, the 
value function being infinite for x E D*. The computation is done until 
T = 3.73. The structure of the lower fronts is similar to that in the 
previous example. 

In Figure 21, a three-dimensional graph of the value function of the 
Figure 20 example is presented. The axes in the horizontal plane are 
xi and ^ 2 , and the vertical axis measures the value function. The pic- 
ture shows the value function for the region of (xi,X 2 ) where the fronts 
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Figure 19 480 upper and lower fronts for We = 0.95 (every 10th front is plotted) 



are computed. The programs for the visualization of such graphs were 
developed [2] by V.Averbukh and O.Pykhteev, Department of System 
Support, Institute of Mathematics and Mechanics, Ekaterinburg. 

Further increases in the value of We extend the set D* . For exam- 
ple, Figure 22 gives computational results for We = 1-9 and A = 0.01. 
The upper and lower fronts are computed until r = 8.42 and r = 1.6, 
respectively. 

7. CONCLUSION 

In this paper, we have studied a variant of the homicidal chauffeur 
differential game, under the assumption that the constraint on the con- 
trol of the evader depends on the state. The two-dimensional situation 
allows a complete description of the families of semipermeable curves 
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Figure 20 746 upper fronts and 340 lower fronts for We = 1.5 (every 10th front is 
plotted) 



that occur. The superiority sets of the evader, where semipermeable 
curves do not exist, are detected. Thus the presence of holes that are 
strictly inside the “victory domains” of the pursuer is explained. A short 
description of the backward procedure for the computation of level sets 
of the value function is also given. This procedure can be employed as 
a specific algorithm for two-dimensional front propagation. 
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Figure 21 The graph of the value function for We — 1.5 
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Abstract This study is motivated by a need to design a diffractive optical element 
arising in an application. Under realistic manufacturing constraints, 
it can be shown that the design problem is an optimization calcula- 
tion with integer variables. We consider an optimization strategy based 
on Genetic Algorithms. We show that for a particular variant, called 
a Micro-Genetic Algorithm, the algorithm converges in a probabilistic 
sense to the global optimum. We demonstrate the use of the algorithm 
in the design of a diffractive optical element. 

Keywords: Diffractive Optics, Genetic Algorithms, Optimization 

1. INTRODUCTION 

We study the problem of creating a desired light intensity pattern on a 
screen. A light source is given. What is required is a thin-film diffractive 
optical element, a lens, that alters the incoming light to produce the 
desired pattern. A sketch of the setup is shown in Figure 1. Under the 
thin-lens approximation, the desired unknown is a thickness distribution 
over the film. This is a form of inverse problem. 
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opaque screen 



light source 



thin film 
diffractive element 



image plane 



intensity pattern 



Figure 1 A sketch of the optimal design problem. Light enters the aperture where 
a thin film element has been placed. The incoming light is altered as it exits the 
film. As the light strikes the screen, it produces an intensity pattern on the screen. 
The inverse problem is to design a diffractive optical element that produces a desired 
intensity pattern. 



The so-called ‘direct problem’ involves finding the intensity pattern 
given a light source, a diffractive optical element, and the geometry of the 
problem. We will show that this process is akin to function evaluation, 
and, under the Kirchhoff approximation, amounts to a quadrature. 

The inverse problem then is to find a diffractive optical element which 
produces an intensity pattern that is as close as possible to the desired 
pattern. Such a problem is very naturally cast as an optimization cal- 
culation. What is somewhat unusual about this work is that we are, 
in addition, given information about the manufacturing process for the 
diffractive optical element. The process poses additional constraints, 
which can be modeled as integer variables with simple bounds. How- 
ever, the size of the problem, usually involving over 20000 variables in 
practical applications, makes it extremely difficult to solve using stan- 
dard optimization methods. 

We propose the use of a variant of the Genetic Algorithm [4, 7, 12]. 
Genetic Algorithms have found success in several areas of optimal de- 
sign in optics [3, 9], as well as in other areas. We opted to use the 
Micro-Genetic Algorithm [11]. We study its convergence properties us- 



Optimal Design of a Diffractive Optical Element 253 



ing Markov Chain analysis [15, 16]. The use of the algorithm applied 
to the optical problem is demonstrated in a numerical example. We 
end the paper with a discussion of some of the issues that occur in the 
optimization calculation. 

2. OPTIMAL DESIGN OF DIFFRACTIVE 
OPTICAL ELEMENTS 

Consider the 2-dimensional problem that is indicated in Figure 2. A 
lens has been placed in an opening of width 2w in an opaque screen. An 
incident light propagates from the left and is diffracted by the lens-screen 
setup. The diffracted light is gathered on the image plane z = d. 

Under the scalar model of light, the scalar field, which could represent 
the strength of the electric or magnetic field, satisfies the Helmholtz 
equation. Specifically, the scalar field u{x^ z) has the property 

Ai/ + k‘^n{x^ z)'^u = 0, 

where n{x^z), the index of refraction, is normalized to 1 in the air, 
and is equal to no in the lens. The variable k represents the normalized 
wavelength in air. To complete the description of the diffraction problem, 
we would specify boundary conditions on the opaque screen along {z ~ 
0, |x| > re}, and the Sommerfeld radiation condition for 
00 . By specifying an incident wave, in this caise the wave generated 
by the source, one can determine the scattered wave by solving the 
partial differential equation with the boundary conditions. Then, after 
calculating u(x^z)^ one can easily find the intensity at the image plane 
z — because it has the value \u{x^d)^. 

When the features in the diffractive optical element are many times 
larger than the wavelength of light, we can approximate the field diffrac- 
ted by the lens using Kirchhoff approximation [2, 5] (see also [14]). Under 
this approximation, the field at a point (xq, zq) is related to its value at 
the aperture by 



u{xo,zo) = ^[ u{x,0) — H[^\kr)dx, (1) 

where r = y^(x — xq)^ + Zq^ and is the Hankel function of the 

first kind of order 1. This formula allows us to find the field at (xq, zq) 
if we know the field at the points where the light leaves the lens. 

Another approximation, based on the fact that the lens is thin, is 
applied. Under this approximation, the lens changes only the phase 
of the incoming light. Specifically, if the incident light at ^ 0 has 
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incident 

wave 
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Figure 2 In the scalar model of light, the field u{x,z) satisfies the Helmholtz equa- 
tion. The lens has an index of refraction different from that of air. There are boundary 
conditions on the opaque parts of the screen, and where oo. The intensity 

at the image plane is \u{x,d)\‘^. 

amplitude u(x,0), the lens alters the amplitude to 

w(a;,0) expi(f{x). 



Here (p is the function 

ip{x) = — -d{x), 



— w < X < 



where d{x) is the thickness of film at x, no is the index of refraction of 
the lens, and A is the wavelength of light in air. 

The diffractive optical element is produced by a material removal 
process. The resulting thickness profile d{x) is piecewise constant over 
regular subintervals of width Ax = 2w/nx’> where rix is a prescribed 
positive integer. Each value of d{x) is an integer multiple of some fixed 
thickness t. This means that the corresponding phase p>{x) is also piece- 
wise constant. Assuming that the material removal rate is constant for 
each removal process, we choose t, depending on no and A, so that the 
phase if[x) is of the form 

27T 

Lp{x) = fj — , Xj<x<Xj+i, j = 0, 1,. . . ,na; - 1, (2) 

f^eA { 0 , 1 , 2 ,...,^-!}, 
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where Xk — —w -f /cAx, k = 0, 1, . . . ^rix. The variable i corresponds 
to the number of steps in the material removal process. Thus the un- 
knowns of the problem are the components of a vector / of length , 
representing the thickness profile. 

The optimization problem can be posed as follows. Assume the in- 
cident amplitude is constant = C. We are given the required 

intensity /target (^) on the image plane z = d. We wish to solve 

^mjnj|7- /target 1 1^ (3) 

I{x) = \u{x,d)f, (4) 

ik d 

u{x, d) = —C / exp iip{Cj - i/p (fcr) d^, (5) 

2 J-w r 

where r = \/{x — + d?. The integer vector / G is related to 

(f{x) through (2). Conservation of energy allows C to be determined 
from /target' 

The difficulties in this problem stem from the facts that (i) the re- 
quired components of / are integer variables, (ii) the number of un- 
knowns Tlx is large in typical design problems, (iii) the functional in (3) 
is nonlinear. While there has been much progress in solving nonlinear in- 
teger programming problems, we were drawn to use Genetic Algorithms 
due to their simplicity. We will demonstrate their ability to “solve” this 
problem in a numerical example, and we will discuss their convergence 
properties. 

3. REVIEW OF GENETIC ALGORITHMS 

Genetic Algorithms are “search algorithms based on the mechanism 
of natural selection and natural genetics” [4]. Principles of evolution and 
heredity are used for the optimization of an objective function. 

Genetic Algorithms are iterative and operate on a set of potential so- 
lutions, called population, and utilize concepts such as selection (survival 
of the fittest), crossover (information exchange), and mutation (adding 
the genetic diversity) [12]. Genetic Algorithms use only objective func- 
tion information (no derivatives) and probabilistic transition rules. The 
origins of Genetic Algorithms go back to the work of Holland [7] and 
others in the 1970s. In recent years the algorithms have attracted a lot 
of interest, and have been used successfully for optimization in different 
areas, in particular, in diffractive optics design problems [3, 9, 10]. 

Let us consider the most basic version, called the Canonical Genetic 
Algorithm. Suppose we want to solve 

max/(x), 



( 6 ) 
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where X is some feasible set, and /(x) > 0 for all x G X. Genetic 
Algorithms use a binary representation of the variable x. If x is a con- 
tinuous variable, it is given finite precision, and can thus be expressed 
in binary form. In our problem, each member of X has integer compo- 
nents, which can be represented easily by a vector of binaries. Therefore 
we can henceforth assume that x = . . . ,x^), G {0, 1}. Every 

such string is called a chromosome or an individual Binary entries 
of the string x are called genes. The cost function /(x) is refered to as 
the fitness. 

A Genetic Algorithm works on a set of N chromosomes (individuals), 
xi,X 2 , . . . ,xat, called a population. The initial population is formed at 
random. Each iteration updates a population to the next generation. 
The basic principle of the update is to increase the average fitness of the 
population. 

At the beginning of each iteration, all individuals of the population 
are evaluated for fitness, that is, we compute /(x^), i = 1 , 2 ,..., TV. 
Next the probability for reproduction is assigned to each chromosome, 
according to its relative fitness 

„ =_iW_ 

In this way, the fitter members of the population have a better chance 
of being chosen, with some fit members having the possibility of being 
chosen more than once. We pick N chromosomes in this fashion, and 
they form the mating pool 

The mating pool is subdivided at random into pairs. Each pair of 
parents, with a probability pc say, performs a crossover to produce 2 
offspring. A crossover is akin to exchanging genetic information. For 
example, in a 2-point crossover 



xi 

X2 



10 0 11 
0 110 1 



becomes 



10 10 1 
0 10 11 ' 



The crossover point, which could be chosen at random, is indicated above 
by a vertical line. If a crossover is performed, 2 offspring replace their 
2 parents, but otherwise both parents proceed to the next generation 
unchanged. 

Finally, mutation is applied to each gene (binary element of the vector 
x) of each offspring, with a probability pm that is usually very small. 
The mutated gene is replaced by the opposite value (0 — > 1 or 1 — > 0). 
Even when the probability of mutation is tiny, its role is important as an 
additional provider of diversity, so it is a source of valuable information 
that could be missing in the initial population. 
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This is the end of a single iteration. A new population is formed 
and the process is repeated until a convergence condition is satisfied or 
until the number of iterations reaches a prescribed limit. The process is 
summarized in the pseudo-code in Figure 3. 



create initial population of N chromosomes 
until convergence, do 

select parent pairs 

crossover 

mutate 

new population 

end 



Figure 3 The Canonical Genetic Algorithm 

By convergence, we mean that the highest fitness value of the popu- 
lation has reached an acceptable value. Alternatively, we could stop the 
algorithm when the maximum of the population is sufficiently close to 
the average fitness of the population. 

There are 3 parameters in the algorithm: (i) size of population A', 
(ii) crossover probability pc? and (iii) mutation probability pm* These 
parameters tend to be problem specific, and there are no general rules on 
how to set them [6]. An evolutionary strategy in which the parameters 
are allowed to evolve just like the population according to some rule has 
also been proposed [12]. 

An important variant to the Canonical Genetic Algorithm is one in 
which the fittest member of the population in each generation is saved. 
This guarantees that the maximum fitness value of the population is 
nondecreasing. 

The Micro-Genetic Algorithm, first proposed in [10, 11], works on a 
small population of m + 1 chromosomes (for example, m = 5). There 
is an inner loop and an outer loop. Within the inner loop, selection 
and crossover take place, while the fittest member of the population is 
saved in each generation. After a convergence criterion is satisfied, we go 
out of the inner loop and regenerate a new population by mutating the 
fittest member m times. The new population then enters the inner loop. 
A feature suggested for this scheme is that the mutation probability 
Pm should be made smaller adaptively as we near convergence. The 
pseudo-code for the algorithm is given in Figure 4. 

The crossover in Figure 4 is of a special type called ^uniform crossover’. 
In this process, we take m members of the population and create m new 
ones by sequentially picking genes from the pool with uniform probabil- 
ity. That is, in creating a new x of length we pick the first gene (bit) 
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create initial population of m + 1 chromosomes 
for iteration = 1 to MAX 

until convergence, do 
save best 

generate m more chromosomes for the next 
population by performing uniform crossover 

end 

keep best chromosome 

mutate m times to create the other elements 
of the new population 

end 



Figure 4 The Micro- Genetic Algorithm. 



from the first bits of each of the m members, with uniform probability. 
We pick the second bit from the second bits of the m members similarly, 
etc. 

Two more features of the Micro- Genetic Algorithm are worth not- 
ing. Firstly, because we work with a small population, the inner loop 
converges fairly rapidly. Secondly, the purpose of the outer loop is to 
introduce diversity via mutation^ in order to explore other areas of the 
search space. 

One clear drawback of Genetic Algorithms is that they do not have 
a good stopping rule. Indeed, our numerical experience shows that im- 
provements in the fitness function as we iterate tend to occur abruptly 
after a few steps that make little or no progress. However, we adopt 
the point of view that the problem is to get an acceptable design, i.e., 
a design whose residual in the fitness function is small relative to the 
norm of the target intensity. Then one can stop the iterations as soon 
as this threshold is crossed. 

4. CONVERGENCE ANALYSIS OF 
MICRO-GENETIC ALGORITHMS 

The application of Markov Chain analysis to Genetic Algorithms was 
first introduced by Rudolph [15], and developed in his book [16]. A 
more recent paper by Agapie [1] investigates the minimum criteria for 
convergence. 

The main ideas are: (i) the population at any one point in the iter- 
ation can be viewed as a state^ (ii) the finite binary representation of 
the variables makes the system a finite state machine, (iii) iterations 
are transitions from one state to another. Basic results from stochastic 
matrices [8, 13, 17] will be useful. 
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With this in mind, let x be a binary vector of length z/, and let m+1 be 
the total number of members in a population. Each state is represented 
by a concatination of m + 1 vectors s=[xo,xi, . . . ,x^]. We denote the 
state space by 5. The total number of possible states is |5| := L = 
Now we index the states from 1 to L as si, S 2 , . . . , sp. Of all 
the possible states, there are M = 2^ states in which the members x^, 
z = 0, 1, . . . , m, are identical We define 

U = {si:xo = xi = = Xm} = {5i : Z = 21,22,... (7) 

say. Let denote the probability of the assertion •. Consider the 

inner loop in the Micro-Genetic Algorithm, sketched in Figure 4, and let 
be the population after the k-th iteration. 

Theorem 1 The inner loop of the Micro-Genetic Algorithm in Figure 
4 converges to a uniform population in the sense that 



lim G C/} = 1. 

k-^oo 

Proof Let Pinner — {Pij} t>c the transition matrix of the inner loop for 
each iteration (generation); i.e., 

Pij = P {Si -> Sj}, 

by the process of saving the best member and performing uniform cross- 
over. Note that Pinner will be stochastic (rows summing to 1) and have 
nonnegative entries. If we order the states so that the first M states are 
in [/, then the transition matrix Pinner has the form 

p- -\^ ^] 

inner ~ ^ 5 ’ 



where I is the M x M identity matrix. This is because, if s^^^ G ?7, then 
s^^^ is not altered by the uniform crossover of the inner loop. 

Next we study the properties of A and B, Note that the block A is 
of size (1/ — M) X M, and corresponds to the transition from s^^^ ^ U 
to G U. This means that it has only a single nonzero entry per 

row, corresponding to the element of U that is composed of m + 1 copies 
of the ‘best’ chromosome that was saved at the beginning of the inner 
iteration. The probability of reproducing m copies of the first member 
of is at least (1/m)"^^, which provides the crude lower bound 

/ . \ mi' 

e /7} > a := ( — ) . 
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Therefore, if a^j is the nonzero entry in row i of A, then a^j > a > 0. 

The submatrix B is of size (L — M) x {L — M). Since Pinner is 
stochastic 

6^ j < 1 — a < 1 . 

3 

By direct calculation, we find that 

p/c ^ r ^ 0 

inner Qk ^ 



where Ak = A = (I — B) ^(I — B^)A. Now, if a matrix B 

has rows whose sums are less than one, then -> 0 as A: -> oo. It 
follows that 



lim P- = 

inner inner 



I 0 

Aoo 0 ’ 



with Aqq = {I — B) ^A. 

Therefore, starting with any distribution the final distribution 



y y ^ inner 



has the property = 0 for j = M + 1, •••,!/, thus proving the 

assertion of the theorem. ■ 

A convergence rate for the inner loop can be estimated. From the 
probability transition p^^^ ~P^^^^fnner’ deduce 

P G c/} = pf ^ = 1 - P ^u] = l- Y pf^- 

{i:Si€U} 



Further, the nature of the transition matrix implies 




{j-sjm 



{i ’ Si ^ [/ } , 



where P* = [b\f]- By using J2jbij < 1 — a for alH, and by summing 
both sides of the equation, we get 



E E E 

{i-Si^U} {v.SiiU} {j-.SjiU} 



= E pf E E pf 
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Therefore we have established 

eu}>l- (1 -a)^ 



where a = (1/m)’^^. 

Each single step of the outer loop of the Micro-Genetic Algorithm can 
be expressed using the transition matrix 

Pouter = ( 8 ) 

where M represents the mutation process, and where the entire inner 
loop has been encapsulated in Now we can state the convergence 

result. 



Theorem 2 Assume that there is a unique global maximizer of the op- 
timization problem (6). Let the probability of mutation in the outer loop 
be fixed with pm G (0,1). Then the iterations of the Micro-Genetic Al- 
gorithm of Figure 4 converge to a population consisting of m + 1 copies 
of the global maximizer of the cost function. 



Proof Recall that an unknown binary vector of length u can take 2^ 
values. When these 2^ different vectors are arranged in descending order 
of fitness, we denote them by xi, X 2 , • • • , X 2 »^. Moreover, we order the 
populations of size (m-h 1) by class. The first class contains populations 
whose first member is xi, the second class contains populations whose 
first member is X 2 , and so on, where the members of each population 
need not be in order of fitness. We have a total of 2^ classes. In each 
class, we let the first population be the one that has m + 1 copies of 
its leading member. This process is nothing more than renumbering the 
states in a particular order. 

Next, recall that the inner loop takes any population into one that 
has identical members. Consider the transition 






= p 



(A:)poo 

inner • 



The new vector has only 2^ nonzeros, corresponding to the pop- 

ulations whose members are identical in the 2^ classes. Note also that 
the populations can only go up in class, or remain in the same class, 
because we always save the fittest member of the population. Therefore 
we deduce that the matrix P-^^er t>e lower triangular, and of the 
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poo 

inner 















l(2M) 


w(2".2) 


1,(2" .2-) 



( 9 ) 



The nonzero elements of the matrix are confined to the 2^ columns indi- 
cated by thin rectangles and labelled From our indexing scheme, 

we know that 



^( 1 , 1 ) ^ 



1 

1 



1 



because P-^^er stochastic and each row of the block corresponding to 
the first class has a single nonzero entry. Therefore the entry must be 
1. Let us consider Each element of this vector corresponds to 

the probability of a transition to the first class from class i. We argue 
that at least one element of must be equal to 1. These elements 
correspond to the existence of populations in class i with probability 1 
of exiting the inner loop in class 1. They have xi as a member, and 
therefore would immediately be promoted to class 1 by the inner loop, 
and would stay in that class thereafter. 

Now we look at the transition matrix M corresponding to mutation. 
The population, upon exit from the inner loop, has m + 1 identical mem- 
bers. A total of m new members are created by sequentially ‘flipping’ 
bits of the chromosome with probability pm- The probability that every 
bit is flipped is whereas the probability that every bit remains the 
same is (1 — Therefore the nonzero part of the transition matrix 

M satisfies 



min[M]i^- > a , (1 

We keep the original member before mutation and label it as the first 
member, thus defining the class of the population, so the mutation pro- 
cess cannot change the class. Therefore the transition matrix corre- 
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spending to mutation has the block diagonal structure 

■ Ml 

Ms 

M = 

where each block is a 2'^^ by 2'^^ stochastic matrix. Using (9) and the 
above, the product (8) takes the form 



P 



outer — 







I£j(2,1) 


1^(2, 2) 



5 



w 



(2M) 



w 



(2" ,2) 



\w 



( 2 ^, 2 ^) 



where The fact that Mi is stochastic implies 






Observe also that 

(*.i) 

w) 






> a, (10) 



{i^) 



the last inequality being valid because we have deduced that at least one 
element of is unity. Further, showing only the first row of Pouter 
explicitly, we can write 





1 


1 — 
o 

o 


P outer — 




Q 



which implies 
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1 — 
o 
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■pk 

outer ~ 
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Figure 5 The target intensity pattern. 




Figure 6 The intensity pattern of the optimized diffractive optical element. 



The matrix Q has the property Qij < 1 — a < 1 by (10), so — > 0 

as A: — oo. It follows from the stochastic property of Pouter that 





- 1 


0 


0 ■ 


•• 0 • 




1 


0 


0 • 


•• 0 


lim — 

iiiii X outer ~ outer ~ 










K-^OO 


. 1 


0 




•• 0 . 



Therefore, the probability that the population reaches a state in which 
all the members are xi is 1 in the limit. ■ 

5. NUMERICAL EXPERIMENT 

We applied a version of the Micro-Genetic Algorithm to the problem 
(3). In the example, we chose a symmetric profile with 1000 unknowns. 
The half-aperture is a; G [0,10], so the width of each ‘bump’ is 0.001. 
The diffractive optical element has 4 levels, so A = {0, 1,2,3}. The 
desired target image was derived from a rough sampling of the intensity 
at the image plane d = 1 over the interval x G [0, 11]. It was obtained 
by solving the forward problem with a known diffractive optical element 
at wavenumber A; = 25, and is shown in Figure 5. The image found by 
optimization is given below it in Figure 6. 

We used a version of the optimization algorithm that changes the 
mutation probability pm adaptively. The algorithm ran for a total 500 
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Figure 1 The progress of the optimization process. Since we fix the number of inner 
loop iterations to 100, the exiting population is not uniform. This graph shows the 
average fitness and the fitness of the best member upon exit from the inner loop. 




Figure 8 Part of the optimized diffractive optical element. The phase is shown as a 
function of position. 



iterations of the outer loop, and each iteration ran for 100 generations. 
The total number of function evaluations with the population size (m -f 
1) = 5 is 2 X 10^. Note that the search space has over 10^^^ elements. 
The reduction in the residual at the end of each iteration in the outer 
loop is shown in Figure 7. Part of the calculated profile found is shown 
in Figure 8. 

We were able to run larger examples, with 20000 unknowns, with some 
success. A disturbing feature of the Micro-Genetic Algorithm, however, 
is that many iterations may fail to reduce the cost function, which makes 
it difficult to decide when to stop. 

We devised a simple way to obtain crude bounds on the value of the 
cost function at the optimum. To obtain a lower bound, we relax the 
search space to continuous variables. This is an unconstrained optimiza- 
tion problem and can be solved by standard methods. In order to get 
an answer (p(x) that is between 0 and 27 t, we simply add or subtract 
integer multiples of 27 t if necessary. To obtain a crude upper bound, we 
take the values of ip{x) calculated in the manner described, and round 
them to the nearest integer multiples of the phase increment. While 
these bounds are crude, they do provide some guidance for stopping the 
iterations. 
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6. DISCUSSION 

We have described a problem of designing diffractive optical elements 
using an optimization strategy. Because of the integer constraints im- 
posed by the manufacturing process, and the size of the problem, we 
chose a rather unconventional method for solving the optimization cal- 
culation. The method is a Micro- Genetic Algorithm. We have analyzed 
its convergence properties, finding that, while there is convergence, there 
is no easy way to estimate the convergence rate. Numerical experiments 
with the method yield satisfactory results. A good match of the target 
intensity profile is usually achieved by this method. 
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Abstract Semidefinite Programming is currently a very exciting and active area of 
research. Semidefinite relaxations generally provide very tight bounds 
for many classes of numerically hard problems. In addition, these re- 
laxations can be solved efficiently by interior-point methods. 

In this paper we study these semidefinite relaxations using the equiv- 
alent Lagrangian relaxations. In particular, the theme of the paper is to 
show that the Lagrangian relaxation is, in some respects, best. In all in- 
stances we consider, we show that whenever we have a tractable bound 
(relaxation), then the same bound can be obtained from a Lagrangian 
relaxation. 

Keywords: Semidefinite Programming, Lagrangian Duality, Relaxations, Quadratic 
Constrained Quadratic Programs, Hard Combinatorial Problems. 

1. INTRODUCTION 

Semidefinite Programming (denoted SDP and sometimes called lin- 
ear matrix inequalities, LMIs) is a generalization of linear programming 
(denoted LP), where the nonnegativity constraints on vector variables 
are replaced by positive semidefinite constraints on symmetric matrix 
variables. These problems have an old history dating back more than 
100 years to Lyapunov’s theory for stability of differential equations, e.g. 
[82, 71]. They were studied and applied in engineering applications as 
early as the 1960s, e.g. [114, 116, 115] and continued into the 1980’s 
(see e.g. the historical outline in [20] and the work on matrix completion 
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problems in [27, 37, 60]). In addition, SDP is a special case of opti- 
mization over cone constraints (generalized linear programming), which 
dates back more than 30 years to e.g. Bellman and Fan [11], and was an 
ongoing active area of research, e.g. [12, 41, 23, 24, 55, 122, 79, 19]. 

The last ten years have seen an enormous interest in the SDP area, 
due to many new and important applications in, e.g. combinatorial opti- 
mization, engineering (systems and control), statistics, etc. This interest 
increased greatly due to the fact that SDP problems can be solved effi- 
ciently (are tractable) by the new interior-point methods, e.g. [77]. One 
of the interesting side effects of the activity is that it has brought various 
different areas of research into contact. For example, the people working 
on numerical issues for large scale problems are now using sophisticated 
techniques in convex analysis, and using adjoint operators rather than 
matrix representations is becoming common. We are all benefiting from 
the elegance and applicability of this area. 

Combinatorial and discrete optimization problems often involve bi- 
nary (0, 1 or ±1) decision variables. These can be modelled using qua- 
dratic constraints — x = 0, or = 1, respectively. Thus many hard 
combinatorial problems can be modelled using quadratically constrained 
quadratic programs, denoted Q^P . However, these latter problems can 
be just as hard (intractable) to solve. Therefore, relaxations are used to 
find approximate bounds and solutions. One can use linear approxima- 
tions and obtain models that can be solved efficiently (tractable models). 
Moreover, using the positive semidefinite matrix construction X = xx^, 
we can lift the problem into matrix space and obtain a Semidefinite 
Programming Relaxation by ignoring the rank one restriction on X, see 
e.g. [65, 38, 66, 102, 8, 7, 78]. This lifting process provides surprisingly 
stronger bounds, both empirically and theoretically, than have previ- 
ously been found, e.g. [35, 47, 3]. Thus SDP provides a means of finding 
an approximate solution to quadratic models for hard problems. 

In this paper we study SDP relaxations as well as exploring the rela- 
tionship between the SDP and Lagrangian relaxations of various classes 
of Q^Ps. (This continues on the work in e.g. [103, 91].) We then con- 
sider the strength of these relaxations, in the sense of strong duality. 
We see that in the simplest case of one constraint (the so-called trust re- 
gion subproblem, TRS), strong duality holds. However, even two convex 
constraints (the CDT problem) can result in a duality gap. Therefore, 
it is surprising that there is a class of matrix problems with orthogonal 
constraints for which there is a zero duality gap. This motivates adding 
certain nonlinear redundant constraints in order to derive a strengthened 
SDP relaxation for the max-cut problem. 
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Throughout this paper we emphasize the theme (or conjecture) that 
Lagrangian relaxation is somehow best. Though this question is very 
vague, so perhaps an answer may not be available, it does give the 
flavour of the approach used in the paper. 

The paper is organized as follows. We begin in Section 2.1 with a 
well known problem in this area, the Max-Cut problem. We present 
several different relaxations. Following our theme, all these bounds, 
including the SDP bound, end up being equivalent to the Lagrangian 
relaxation. We then discuss the TRS and the CDT problem and the 
difference in strong duality for them. This is followed by flnding the SDP 
relaxation for general Q^P in Section 2.2. It includes descriptions of the 
relationships between the SDP relaxation and the Lagrangian relaxation 
via convex quadratic valid inequalities, following [33, 52]. Occurrences of 
strong duality for nonconvex quadratic programs are studied in Section 
2.3. In every instance where one has a tractable bound, we And a Q^P 
such that the bound is attained by the Lagrangian relaxation. This 
follows the work in [6, 5]. 

We conclude with a strengthened SDP bound based on a second lifting 
procedure. This illustrates a recipe for constructing semideflnite relax- 
ations using the Lagrangian dual of the Lagrangian dual of the original 
Q^P . In addition, we see the advantages of this approach as redundant 
constraints added at the start provide strengthened bounds, but do not 
result in redundancy in the Anal SDP relaxation. 

1.1. LAGRANGE MULTIPLIERS FOR Q^P 

The Q^P in x is 

/i* := min qo{x) := x'^Qqx + 2qqX + ao 

(Q^Px) s.t. qk{x) := x^QkX + 2glx + < 0, 

/e E X := {1, . . . ,m}, 

where Qk E the space of symmetric matrices. The Lagrangian of 
Q^Px is 

L{x, A) — qo{x) + XkQkix), 
kex 

where A = (A^) > 0 are nonnegative Lagrange multipliers. 

Lagrange multipliers can be used in two ways. First, if a constraint 
qualiflcation holds for Q^P^; at the optimum x (e.g. the Mangasarian- 
Fromovitz constraint qualiflcation), then the Karush-Kuhn-Tucker nec- 
essary conditions for optimality are satisfled, i.e. 

VL{x, A) = 0 and XkQk{x) =0, VA: E X, 
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for some 0 < A E Therefore the search for the optimum x can 

be restricted to the points satisfying stationarity of the Lagrangian and 
complementary slackness. Moreover, if the Lagrangian is also convex, 
then these (and primal feasibility) are sufficient conditions for optimality. 

Lagrange multipliers can also be used to derive the Lagrangian dual 
(or relaxation) of 

(DQ^Px) ■= max mingo(a;) + Afcgfc(x), 

kex 

i.e. each inner unconstrained minimization problem provides a lower 
bound for Q^Pa^; and we then choose the best of these lower bounds. 
A zero duality gap holds if fi* — u* ^ but this condition can fail in the 
nonconvex case. Strong duality holds if /i* = z/* and also i/* is attained. 

Remark 1.1 Unfortunately, the term strong duality is ambiguous in 
the literature as it is sometimes used to define a zero duality gap with 
both primal and dual attainment. 

Remark 1.2 Let q := {q ^) . Then the sum in the Lagrangian can be 
rewritten as (A, q[x)) = )^q{x). This appears to be too trivial to mention. 
However, it does emphasize how Lagrange multipliers arise when one 
is dealing with matrix valued constraints. In particular, if one has a 
constraint Q{x) = 0, where the image of Q is a symmetric matrix, then 
the Lagrange multiplier will be a symmetric matrix, say A = A^, and 
the term in the Lagrangian will be the inner product {Q{x),A). With a 
nonnegativity constraint, there will be a sign restriction on the Lagrange 
multiplier. 



1.2. SEMIDEFINITE PROGRAMMING 
PRELIMINARIES 

SDP is an extension of LP in that matrix variables replace vector 
variables and nonnegativity elementwise is replaced by positive semidef- 
initeness. In addition, it is a special case of the cone programming 
problem 

min f{x) . . 

s.t. g{x) ^kO, ^ ’ 

where JT is a convex cone {K+K C K, aK C K, "ia > 0) and g{x) 

0 denotes the cone partial order g{x) E K. If K = then we get the 

usual elementwise ordering g{x) > 0. This notation allows comparisons 
with many results that hold for the elementwise ordering, e.g. Jensen’s 
inequality for the definition of convexity. In the case that K = V, the 
cone of positive semidefinite matrices, then we have the Lowner partial 
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order^ e.g. [69]. In fact, this is a very general mathematical program 
and includes standard inequality, equality, and semidefinite constraints, 
when K = {0} (8) V. where {0} is the set containing the 0 vector. 

We now look at some of the relations between LP and SDR 

Geometry. Much of the elegant geometry of polyhedral sets de- 
veloped for LP can be extended to SDR This was studied as early as 
1948 by Bohnenblust [15] and later by Tausky [108] and also by Barker 
and Carlson [10]. More recently, motivated by the high interest in SDP, 
many new results on differentiability and multiplicity of eigenvalues have 
appeared, see e.g. Lewis [63, 64] and Pataki [83, 84], respectively. In 
addition, results on characterizing different types of homogeneous and 
self-scaled cones appear in [42, 111, 44]. 

Let (or V when the meaning is clear) denote the cone of positive 
semidefinite matrices in the space of n x n symmetric matrices. The 
following remarks have analogues in LP. The SDP cone is self-polar, i.e. 

V^V-^ :={Y :X^Y> 0, VX G P} , 

where X •Y = Trace XT is the trace inner product. Also, V is ho- 
mogeneous, i.e., for any X, X G int(P), there exists an invertible linear 
operator A from S'^ to S'^ that leaves V invariant and that has the 
property ^(X) — Y. 

A face of a cone K is defined as 

X — {x E K \ y,z E K, X — ay {I — a)z, Q < a < I ^ y,z E X] 

i.e. X ^ X can be an interior point of the line segment that joins y,z E K 
only if y, z G X. The faces of can be characterized using any vector in 
its relative interior, i.e. for y in the relative interior, we define the vector 
z hy Zi = I \i yi = 0, and = 0 if > 0. Thus 2: is in the relative 
interior of the complementary face. Then the corresponding face that 
contains y in its relative interior is 

{x G : {x, z) = 0} == {x G : yi = 0 = 0} . 

The faces X of V can be similarly characterized using any Y G relint X, 

i.e. 

X={X :Af{X) DA/'(X)}, 

where A/’(-) denotes nullspace. The faces of V are exposed, i.e. X — 
V where (p G V D X^ is a conjugate face, (Here the subscript ± 

denotes the orthogonal complement.) Moreover, they are projectionally 
exposed, i.e. there exists a projection matrix P such that X = PVP; 
note that P • P is a projection on S^. In fact, we can choose P (and so 
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P ‘ P) to be an orthogonal projection. See e.g. [87, 40, 39] for results on 
faces and [18, 9, 93, 107] for results on projectionally exposed cones. 

Duality and Optimality Conditions. Extensions of the opti- 
mality conditions and duality theory from LP to SDP appeared in [11]. 
However, unlike the LP case, strong duality theorems required a Slater- 
type constraint qualification (denoted CQ). This CQ is strict feasibility, 
i.e. there exists a feasible point in the interior of V, In [95] it is shown 
that difficulties in the duality theory can arise due to a property of the 
faces of P, namely that 

V -f- span (P) is never closed. (2) 

On the other hand, these difficulties can be corrected by taking advan- 
tage of the fact that 

P -k is always closed. (3) 

We amplify these statements in Remark 1.3 below. 

Now, consider the typical primal SDP 

. . fj.* ~ min C • X 

s.t. A{X) = 6 and W ^ 0, 

where C E b E: IBP ^ and A \ S'^ IBP ^ is a linear operator. Thus 

the components of the constraint equations are 

{A{X))i = Trace = 6^, z = 1, 2, . . . , m, 

for some given symmetric matrices Ai. 

We can define the dual and prove weak duality using the notion of a 
hidden constraint. Specifically we use the form 

u* == min max C • X + y^ib — ^(X)) 
xyo y 

> max min 'iFb + (C — A*(y)) • X 
“ y XtO 

where A* is the adjoint operator of A, so it satisfies 

A{Xfy = X»A*{y), VX, Vy, 

which implies A*{y) = Yli the symmetric matrices Ai being defined 
above. The dual program is 




(D) 



u* := max b^y 

s.t. A*{y) ^ C. 
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The first equation in (4) follows from the fact that the inner maximiza- 
tion is finite (the hidden constraint) if and only if A{X) = b. The 
inequality follows from taking the minimization first. The last equation 
follows from the hidden constraint, because y has to be such that the 
inner minimization is finite valued. 

The duality theory gives rise to the elegant characterization of opti- 
mality 



A^{y)-\-Z-C - 0 

b-A{X) = 0 

ZX = 0 

Z,X ^ 0 



dual feasibility 
primal feasibility 
complementary slackness 
positive semidefiniteness. 



The variables X, (y, Z) are called a primal-dual optimal pair, Z being 
the (dual) slack variable. If only complementary slackness fails, then the 
variables are called a primal-dual feasible pair. Note that complementary 
slackness can be written in the equivalent form Trace ZX — 0, which 
follows from an orthogonal diagonalization of Z and X. This is also 
equivalent to having a zero duality gap between primal and dual values 
for a feasible pair since 

Trace ZX = Trace {C - A*{y))X = Trace CX - y'^b. 



Remark 1.3 We mentioned above that the optimality conditions require 
a CQ, i.e. strict feasibility^ and that condition (2) can cause difficulties. 
Consider the SDP P and its dual D. Dual feasibility can be written in 
the form 

A%y) + Z = C, ZyO. 

If we choose the linear operator A* so that its range satisfies TZ{A*) = 
span{P)j for some face T of V, then we can choose any C in the 
set Tl{A*) + V, but not its closure, because choices in the closure can 
force failure hf dual feasibility and a duality gap. For example, let 

A := and so A*{y) = yA. Then ^ ~ o) admissi- 

ble if and only if e is positive. Therefore we address C' = ^ ^ q ^ (ind 

6 = 0. In this case, the primal problem constraint A{X) = 6 implies 
X 22 0, and then the other constraint X ^ 0 implies X 12 == X 21 — 0. 

Thus fji* = C • X = 0 is optimal in P. We have noted, however, that the 
dual is infeasible for the given choice of C. 

The condition (3) can correct this problem by allowing a larger set of 
dual variables. Modified optimality conditions without any CQ can be ob- 
tained. This was done in [17] and also in [97, 95]. In our primal problem 
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P, we let T be the minimal face, i.e. the smallest face containing the fea- 
sible set. Then the strengthened dual is /a* = max 6^?/ s.t. A*y :<jr+ C. 

For our example, the minimal face is T — span ^ ^ ^ ^ , which implies 

that C — (— 1)A is in . Hence the new dual optimal value is 0, and 
we have a zero duality gap. 

If we perturb the complementary slackness conditions, 



ZX = fil, /^ > 0, 

where /i is a parameter, then (5) becomes the (modified) optimality con- 
ditions of a log-barrier problem. These are the equations that are used 
in interior-point methods. However, unlike linear programming, we have 
an interesting subtle complication that is also discussed in [113]. One 
cannot apply Newton’s method directly since ZX is not necessarily sym- 
metric, and so we end up with an overdetermined system of equations. 
There are various ways of modifying this system in order to get good 
search directions, see e.g. [110, 72, 57]. Many of these directions work 
very well in practice, which is clear from empirical evidence and the 
derivation of several public domain codes, e.g. [47, 1, 105, 106, 16]. The 
SDP problems are tractable because they are convex programs and fall 
into the class of problems that can be approximately solved to a desired 
accuracy in polynomial time. This follows from the seminal work of 
Nesterov and Nemirovski [76, 77]. The algorithms that currently work 
well are the primal-dual interior-point algorithms. This area of research 
is ongoing, however, and there are many classes of problems with spe- 
cial structure where dual algorithms based on a bundle trust approach 
perform better; this is especially true if it is too expensive to evaluate 
the primal matrix X explicitly, see e.g. [13, 46, 61]. 

First and second order optimality conditions for SDP are given in 
[100, 101] and in the survey article [113], for example. Nondegeneracy 
and strict complementarity are discussed in [2, 85]. Both nondegeneracy 
and strict complementarity can fail, though they are generic conditions. 
In addition the theorem of Goldman and Tucker [36], about the exis- 
tence of an optimal primal-dual pair that satisfies strict complementary 
slackness, does not apply to SDP. Note that strict complementarity for 
SDP is the condition Z -h X 0. 



2. RELAXATIONS OF Q^P 

We now look at a particular instance of Q^P, namely the quadratic 
model for the Max- Cut problem. We start with several different tractable 
relaxations for this problem that have appeared in the literature. We 
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show that, surprisingly, they are all equal to the Lagrangian (and SDP) 
relaxation. 

We then consider trust region type problems and discuss when strong 
duality holds. We include problems where orthogonal constraints arise, 
e.g. orthogonal relaxations of the quadratic assignment and graph par- 
titioning problems. Thus this part of the chapter emphasizes the theme 
about the strength of Lagrangian relaxation. 

2.1. RELAXATIONS FOR THE MAX-CUT 
PROBLEM 

One of the problems for which the SDP relaxation has been par- 
ticularly successful, both empirically and theoretically, is the Max- Cut 
Problem^ e.g. [45, 35, 34]. Let G — {V^E) be an undirected graph with 
vertex set V = and weights Wij on the edges {vi^ Vj) G E. We seek 

the index set I C {1, 2, ... n}, that maximizes the sum of the weights 
of the edges with one end point with index in I and the other in the 
complement. This is equivalent to 

(MC) meiX - XiXj), x G 

where E := {±1}^, and = 1 if i G I, and x^ = — 1 otherwise. The 
objective function is a (homogeneous) quadratic form, x^Qx. 

Several Different Relaxations. We rewrite MC as the more 
general problem 

(MCQ) n* := m&xqQ{x), 'Nh.ere qQ{x) := x^Qx -2c^x. (g) 

The MCQ (and MC) problem is intractable, but there are many differ- 
ent ways to relax the problem and find approximate solutions and/or 
bounds. The simplest way is to relax the constraints to the interval 
conditions x G [—1,1]’^. This bound constrained quadratic problem is 
NP-hard if Q is not negative semidefinite [81]; while, if Q is negative 
semidefinite and c = 0, then the solution is the trivial 0 solution. 

Other relaxations are also geometric in nature and involve perturba- 
tions of the objective function. For example, one method relaxes the 
constraints to the unit ball of radius >/n, while another relaxes the con- 
straints to their convex hull, i.e. to the unit cube. 

The relaxations yield bounds which are derived by making changes to 
the objective function qq that are zero when x? = 1, i.e. on the feasible 
set T . In particular, for every u G 3?^ we have 

Qu{x) •— x^(Q -h Diag (u))x — 2c^x — 

= go(^), Vx G E, 



( 7 ) 




278 Henry Wolkowicz 



where e is the vector of ones (of the appropriate dimension), and Diag {u) 
denotes the diagonal matrix formed from the vector u. For each u we get 
an upper bound by ignoring the constraints and by allowing the diagonal 
perturbations, i.e. we have 

< fo{u) := maxQuix). (8) 

X 

We then find the bound 

/i* < -Bo min/o(u). (9) 



Let 

S {u \ u^e = 0, Q + Diag {u) :< 0} . 

Note that, if the set 5 is not empty, then we can minimize over the 
unconstrained parameter w, or we can add the restriction u^e — 0, i.e. 

Bq ^ min /o(^x), if 5 ^ 0. 

This can be seen from the equivalence of the optimality conditions for 
min- max problems, e.g. [26, Pg 188, Theorem 2.1], or from a perturbation 
analysis of min- max problems, e.g. [29]. For example, if the solution of 
the inner maximization problem is attained at a unique x, then /o is 
differentiable and V fo{u) = x o x — where x o x is the Hadamard 
(elementwise) product. In general, the function fo{u) is directionally 
differentiable at each point u (with u^e = 0 in the restricted case), in 
any direction h. Assuming ||/i|| = 1, the directional derivative has the 
value 



a/o(u;/i) 



max 

x£JM{u) 



dqujx) 
du ' 



= max {x o X — e, h) , 
xex{u) 



where X {u) denotes the set of values of x that solve the inner problem 
(8) for given u. One can now compare stationarity for /o in the uncon- 
strained case with the use of a Lagrange multiplier for the constraint 
u^e = 0. This involves the subgradient of the convex function /o and 
its domain, i.e. the set where /o is finite valued. Details can be found 
in [92]. The comment about the possible use of u^e = 0 is also true for 
the bounds that follow. 

But the function /o can take on the value -f-cx). We can avoid these 
infinite values by restricting the parameter u, using a hidden semidefi- 
nite constraint. Specifically, this constraint depends on the remark that 
a quadratic function is bounded above if and only if its Hessian is neg- 
ative semidefinite and its stationarity equation is consistent. Thus we 
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obtain the following bound, which is tractable since we minimize a con- 
vex function over a convex set: 

12* < Bo = min fo{u). (10) 

(5+Diag (w):<0 



Next we relax the conditions on x to the sphere of radius ^/n, which 
provides 

/i* < fi{u) := max qu{x). (11) 

||x||2=n 

Hence our next bound is 

^ ( 12 ) 

u 

The inner maximization problem is called a trust region subproblem and 
is tractable, as shown in Section 2.3 below. Thus we have our second 
tractable bound. 

We can replace the spherical constraint with a box constraint, which 
gives 

pt* < f 2 {u) := max qu{x). (13) 

|Xi|<l 

After adding the semidefinite constraint on u to make the calculation of 
/2 tractable, we obtain our next bounds 



and 



H* < min/2(u) 

U 



(14) 



li* < B 2 ■■= min / 2 (w). 
Q+Diag (it)::<0 



(15) 



Given Q and c of the function (7), we define the (n + 1) x (n H- 1) 
matrix by adding a leading row and column, so has the elements 

Qoo = Qoi = = -Cj for i > 0, and 

Qij = Qij for bJ > 0, 



i.e. 






-c 

Q 



(16) 



Further, in order to have functions q^iy) and fi{u) that are analogous 
to the previous cases, we introduce 



Q^iy) ~ y^iQ"" + Diag (u)) y - vF e, y € 



(17) 
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where u and e are also in Note that reduces to if the first 

component of ^ is yo = 1- The equivalent relaxed problem is 

1^* < fi{u) := max q^{y) = (n + 1) Amax(Q‘' + Diag (u)) -u^e, (18) 

||j/||2=n+l 

where Amax(-T) denotes the maximum eigenvalue of A, say. Thus another 
bound on /x* is 

if < Bl := min/f(w). (19) 

u 

Similarly, we get equivalent bounds and homogenized bounds for the 
other models. 

The above argument shows that we can homogenize the problem by 
moving into a higher dimension. Therefore we can assume the special 
case c = 0. We now look at the SDP bound. The relaxation comes from 
the fact that the trace has the commutative property 

x^Qx = Trace = Trace 



and, for x G the matrix Y with elements yij = xiXj is symmetric, 
rank one and positive semidefinite, with ones on the diagonal. Therefore 
we can lift the problem into the higher dimensional space of symmetric 
matrices and relax the rank one constraint. Thus we obtain a relaxation 
that gives the bound 



Bs max Trace QT 
(MCSDP) >^€5- 

s.t. diag (y) = e and T ^ 0, 



( 20 ) 



where diag (T) is the vector formed from the diagonal of Y. This SDP 
is a convex programming problem and is tractable. 

We have presented several different tractable bounds that have simple 
geometric interpretations. It is not at all clear which bounds are better 
or how to compare them. We now do something that may seem mean- 
ingless; we replace the ±1 constraints with x? = 1, z = 0, . . . , n, which 
does not change the feasible set of the original problem. In [92, 91] it is 
shown, however, that all the above relaxations and bounds for MC come 
from the Lagrangian dual of the following equivalent problem to MCQ: 



Xe) 



max qo{x) — x^Qx — 2c^x 
s.t. X? = I, z = I, . . . , n. 



( 21 ) 



Thus we enforce our theme about the strength of Lagrangian relaxation. 
The strong duality result for the trust region subproblem is the key to 
the proofs. Note that the Lagrangian dual of P^; yields precisely our 
first bound Bq on /z*, given in (9). 
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Theorem 2.1 All the bounds for MCQ discussed above are equal to the 
optimal value of the Lagrangian dual of the equivalent program Pe- M 



2.2. GENERAL Q^P 

We now move on to applying the Lagrangian relaxation to general 
quadratic constrained quadratic problems, denoted Q^P. The general 
Q^P problem is also studied in [33, 52, 96, 58, 56, 68, 59], for example. 

Quadratic bounds using a Lagrangian relaxation have received much 
attention and been applied in the literature, for example in [53] and, 
more recently, in [54]. The latter calls the Lagrangian relaxation the 
“best convex bound” . Discussions on Lagrangian relaxation for noncon- 
vex programs also appear in [31]. More references are given throughout 
this paper. 



Remark 2.1 Any equality constraints are written as two inequality con- 
straints; any linear equality constraints^ Ax = 6, are transformed to the 
quadratic constraint \\Ax — 6|p = 0. The reason for these transforma- 
tions for linear equality constraints is discussed in [91]. It is that the 
Lagrangian dual essentially ignores linear constraints^ as can be seen 
from: — oo = max^ min^^ — -I- Ax, which is the dual of the problem 
min{— x^ : X = 0}. 

We now recall from Section 1.1 the Q^P in x: 

jjL* ~ min qo{x) x^Qqx H- 2gTx + ao 
(Q^Px) s.t. qk{x) := x'^QkX + + ak <0, (22) 

k E I {I, . . . ,m}, 

where the matrices Qk are symmetric. The feasible set is 



:= {x 6 : Qkix) < 0, VA: G I}. 



Note that, though the feasible set T may be empty, the feasible set of 
the relaxation may not be. The objective function and the constraints 
need not be convex. Therefore the feasible set can have “nasty” features 
that cause the problem to be very hard to solve in general, see e.g. [81]. 

Let 



Pk:= 



9k 1 

9 k Qk 



(23) 



and, by abuse of notation, define 



Qkiy) ■= y^PkV, k = 0,l,...,m. 
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Then an equivalent formulation of Q^Px is the homogenized problem 
fj.* = min qo{y) 

^Q2p ) s.t. qk{y) <0, kel 






We see that the optimal values of Q^Px and Q^Py are equal. Further, if 
yo = — 1 is optimal, then we can replace y by — y, because the objective 
function and all but the last constraint are homogeneous. 

We will refer to both equivalent formulations of Q^P in the sequel. 
The relevant one will be clear from the context. 



Remark 2.2 Note that we could replace the constraint — 1 by = \ 
as in [S3], In the yo 1 the feasible sets of the two formula- 

tions coincide exactly, while in the former case they can differ by a sign. 



Specifically, x ^ T implies that both 



-1 

—X 



and 



are in Ty, 



where Ty is the feasible region of the homogenized problem Q^Py. 



The Lagrangian Relaxation of a General Q^P. The Lagrangian 
relaxation of the homogenized problem Q^P?/ provides a simple technique 
for obtaining the SDP relaxation. In addition, an application of the 
strong duality result for the trust region subproblem shows that the 
SDP and Lagrangian relaxation are equal. The problem QfPy has the 
Lagrangian 

L(y, O', A) := y'^Poy - a{yl - 1) + Xky'^Pky, 

kei 



and the Lagrangian relaxation of Q^Pj^ is 

(DQ^Py) u* := max min y^Poy - a{yl - 1) + Xky^Pky- 

^ hex 



Note that 



- max max min y'^Poy - < 7 ( 7/0 - 1) + V] Xky^ Pky 



max min y'^ Poy + Xky'^ PkV , 
A>0 y:y^^ = l 



from strong duality of the trust region subproblem [104]. Therefore, we 
get equivalence of the dual values for the problems DQ^Py and 



(DQ^Px) ly* = max min qo{x) + >^kqk{^)^ 

A>0 X 
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which is similar to the approaches in [116, 103]. It follows that weak 
duality 

u* < n* ^ min y^Poy - cr{yl - 1) + ^ Xky'^Pky 
^ kel 

holds. Therefore, if the optimal cr* and A* can be found, we have de- 
rived a single quadratic function whose minimal value approximates the 
original minimal value /i* of Q^Py, i.e. 

y* >iy* = min y'^Poy - cr*{yl - 1) + V Xly'^Pky. (24) 

kei 

Moreover, in the dual program DQ^P^, the Lagrangian is a quadratic 
function of y. Therefore the outer maximization problem has not only 
nonnegativity constraints but also the hidden semidefinite constraint 

Po — ^ ^kPk ^0, A > 0, (25) 

kex 

where Eqq is the zero matrix except for 1 in the top left corner. The 

solution of the minimization subproblem is attained at y = 0. Therefore 

the Lagrangian dual is equivalent to the SDP problem 

u* := max a 

(DSDP) s.t. crEoo — Yhkex^kPk Po 

A > 0. 

Valid Inequalities. Using the above approach, we see that more 
constraints qk{y) give a stronger dual. Therefore the addition of re- 
dundant constraints to get new valid inequalities can strengthen the 
relaxation. We will see how this occurs when we look at orthogonally 
constrained problems below. Another approach is also specified in detail 
in [33] and [52, 51]. 

For problems that also have linear inequality constraints, one can use 
the notion of copositivity to strengthen the SDP relaxation. However, 
this does not result in a tractable relaxation in general [94]. 

Specific instances of these relaxations (graph partitioning and quad- 
ratic assignment problems) appear in [121, 112]. We will present a recipe 
for generating a relaxation in Section 3. 

2 . 3 . STRONG DUALITY 

In the case of strong duality (zero duality gap and dual attainment), 
our bounds are exact. As expected, this holds (generically) in the general 
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convex case. Surprisingly, there are several cases of nonconvex quadratic 
programs where it holds as well. In this Section 2.3 we amplify on our 
theme that confirms the strength of Lagrangian relaxation, namely that 
a tractable bound implies a Lagrangian relaxation is at work. 

We recall the general quadratically constrained quadratic program 
(22), where for simplicity we have replaced each equality constraint by 
two inequality constraints. We will use equality constraints when abso- 
lutely required. We let T denote the feasible set. 

We define the Lagrangian 



m 

L{x, A) qo{x) + XkQkix), 

k—1 



and the dual functional 



(^{X) ~ minZ/(x, A). 

X 

The Lagrangian is linear in A, and so the dual functional is a concave 
function of A. Thus the calculation of the maximum of this concave 
function is a tractable problem if (y:)(A) can be evaluated efficiently. For 
each A > 0, we have the lower bound 

/i* = min go ( 2 ^) ^ minL(x,A) 

> minZ/(a;,A) = V^(A). 

X 

Thus we deduce the dual problem 

a* > = max(/?(A), 

A>0 

which provides a lower bound for the primal problem. If, in addi- 
tion, we find a feasible x E T with attainment in the Lagrangian x G 
argmin^ L(x, A) and with complementary slackness = 0? 

then 



/i* > I/* > L(x,A) 

= qo{x) > jj*. 

Therefore x is optimal and the duality gap is zero when these sufficiency 
conditions (feasibility, attainment, complementary slackness) hold. Note 
that, since we are dealing with an unconstrained minimum of a quadratic 
Lagrangian, we obtain the interesting statement: the given conditions 
hold only if the Lagrangian is stationary and its Hessian is positive 
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semidefinite. Further, when these two conditions are incompatible, we 
lose strong duality, and can even expect a duality gap. 

We now present several Q^P problems where the Lagrangian relax- 
ation is important and well known. In all these cases, the Lagrangian 
dual provides an important theoretical tool for algorithmic development, 
even if the duality gap may be nonzero. We continue to emphasize our 
theme that the Lagrangian relaxation is best. 

Convex Quadratic Programs. We start with the easy case; 

consider the convex quadratic program 

(CQP) /i* min qQ{x) 

s.t. ^ 0, fc = 1, . . . , m, 

where all the functions qi{x) are convex and quadratic. The following 
remarks show that Lagrangian duality can always solve this problem. 
The dual is 

m 

(DCQP) := max min qo{x) + Xkqkjx). 

k=l 

If u* is attained at A = A* and x = x*^ then sufficient conditions for x* to 
be optimal for CQP are primal feasibility and complementary slackness 

m 

Y,K<ik{x*) = Q. 

k=l 

It is also well known that the Karush-Kuhn-Tucker (KKT) conditions 
are sufficient for global optimality, and, under an appropriate constraint 
qualification, they are also necessary. Therefore strong duality holds if 
a constraint qualification is satisfied, and then there is no duality gap 
and the dual is attained. 

Further, if the primal value of CQP is bounded then it is attained 
and there is no duality gap, see [109, 89, 90, 88]. This assertion can 
be regarded as an extension of the Frank- Wolfe Theorem [68]. Surpris- 
ingly, however, the dual may not be attained. For example, the convex 
program 

0 = min{x : < 0} (26) 

has the (unattained) dual 

0 = max min x H- Xx^ = max min x + Xx^. 

A>0 X A>0 X 

Algorithmic approaches based on Lagrangian duality appear in [48, 67, 
77], for instance. 
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Rayleigh Quotients. Suppose that A = E S^. It is well known 
that the smallest eigenvalue Ai of A is the Rayleigh quotient 

Ai = mm{x^ Ax : x^x = 1}. (27) 

Since A is not necessarily positive semidefinite, this form may require the 
minimization of a nonconvex function on a nonconvex set. The Rayleigh 
quotient, however, forms the basis of many very efficient algorithms for 
finding the smallest eigenvalue. It is easy to deduce that there is no 
duality gap for this nonconvex problem, by using the equation 

Ai = max min x^ Ax — Xix'^x — 1) = max A. (28) 

A a; A-A/XO 

Specifically, the inner minimization problem in (28) is unconstrained. 
Therefore the outer maximization problem has the hidden semidefinite 
constraint (an ongoing theme in this paper) 

A - A/ ^ 0, 

which requires A to be at most the smallest eigenvalue of A, With A set 
to the smallest eigenvalue, the inner minimization yields either x = 0 or 
the eigenvector corresponding to Ai, the corresponding value of the inner 
objective function being Ai in both cases. Thus, we have an example of 
a nonconvex problem for which strong duality holds. Note that the prob- 
lem (27) has a special norm constraint, and a homogeneous quadratic 
objective. Thus, this nonconvex problem can be solved efficiently. Fur- 
thermore, strong duality holds for the Lagrangian dual, which supports 
our theme. 



Trust Region Subproblems. We are going to see that strong 
duality holds for a larger class of apparently nonconvex problems. The 
trust region subproblem, TRS, is the minimization of a quadratic func- 
tion subject to a norm constraint, where no convexity or homogeneity 
of the objective function is assumed. We make a further extension, i.e. 
we do not assume convexity of the constraint, so both the objective and 
constraint functions are allowed to be indefinite quadratics. Some ap- 
plications of indefinite quadratic forms are given in [22]. This problem 
is important in nonlinear programming, e.g. [75, 74]. Specifically, TRS 
has the form 



(TRS) 



p* := min ^o(^) — x^Qqx — 2cqX 
s.t. x^x — < 0 (or = 0), 



and the generalized trust region subproblem [104, 73] is 



(GTRS) 



/i* min ^o(^) == x^Qqx — 2cqX 
s.t. qi{x) <0 (or = 0), 
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where q\ is another quadratic function. Further, one can have two sided 
constraints a < q\{x) < (3, which occur in some trust region algorithms 
as well. 

For TRS, assuming that the constraint is written the Lagrangian 
dual is: 

(DTRS) z/* max min qoix) + Xix^x — 5^). 

A>0 X 



An equivalent problem (see [104]) is the (concave) nonlinear semidefinite 
program 



(DTRS) 



V* ~ sup c^(Qo + A/)^co - 

s.t. Qo “1“ XI 0, A ^ 0, 



where the superscript f denotes the Moore-Penrose generalized inverse. 
It is shown in [104] that there is a zero duality gap for TRS, fi* = u*. 
(The primal is attained though the dual may not be, as in example (26) 
below.) Thus, as in the eigenvalue calculation, we have an example of a 
nonconvex program where strong duality holds. Therefore this problem 
can be solved efficiently, polynomial time results being presented in [117]. 

We include a short proof of strong duality for the inequality con- 
strained case, based on the outline in [62], which applies a convex case 
result after a perturbation. Note that the key to the proof is being able 
to pass between the inequality and equality constraints. 



Proof. 

Without loss of generality, we assume that TRS is nonconvex, because 
otherwise we apply the convex results discussed above. Therefore fi* is 
attained on the boundary of the feasible set and the smallest eigenvalue 
of Qq, denoted 7, is negative. Thus TRS has the required property 






* 



< 



^I)x — 2cqX + 



min x^ (Qo ~ 
xT'x<5‘^ 

min x^{Qo — 'yl)x — 2 cqX + 
x^x=S^ 

min x'^iQo — 

x'^x=6‘^ 

min x^(Qo — 

x'^x<6^ 

max min x'^iQn 
A>0 X 



max min x^ Qnx 
A>0 X 

max min x^Qqx 
A>7 X ^ 

3|e ^ 5|c 

V < H . 



'yl)x — 2cqX + 
'yl)x - 2 cqX + 

— 'yl)x — 2c^x 

— 2c^x + (A — 

— 2cqX + (A - 



■yx'^x {Qo is 
7(5^ 

7 (5^ (Qo - 
+ \{x^x — 6^) 

'y){x'^x — S'^) 
'y){x^x — 6“^) 



indefinite) 

is singular) 

+ 7<5^ 

(convex case) 

(7 < 0) 



( 29 ) 
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As mentioned above, extensions of this result to a two-sided, general, 
possibly nonconvex, constraint are discussed in [104, 73]. An algorithm 
based on Lagrangian duality appears in [98] and (implicitly) in [74, 99]. 
Such algorithms are highly efficient for the TRS problem, since they 
solve it almost as quickly as an eigenvalue problem. 

This efficiency when the objective and constraint may be nonconvex is 
surprising. In fact, Martinez [70] shows that the TRS can have at most 
one local and nonglobal optimum, and that the Lagrangian at this point 
has one negative eigenvalue. Therefore, it is even more surprising that 
the Lagrangian dual (relaxation) allows one to find the global minimum 
without ever getting trapped near the local minimum. 

In fact, for GTRS we still have a zero duality gap, though strong du- 
ality may fail, as shown in (26). The results in [104] provide strong 
duality for GTRS with a two sided constraint a < qi{x) < j3^ the 
constraint qualification being a < (3. In [73], necessary and sufficient 
optimality conditions are presented for GTRS with the constraint quali- 
fication mmqo{x) < ma>xqo{x). A combination of these results with the 
extension of the Frank- Wolfe theorem [68] gives the following properties. 

Theorem 2.2 Consider GTRS: a zero duality gap always holds and, if 
the optimal value is finite, then it is attained. ■ 



Two Trust Region Subproblem. The two trust region subprob- 
lem, TTRS, is the minimization of a (possibly nonconvex) quadratic 
function subject to a norm and a least squares constraint, i.e. two con- 
vex quadratic constraints. This problem arises in algorithms for general 
nonlinear programs that use a sequential quadratic programming ap- 
proach, and is often called the CDT problem, because it was introduced 
by Celis, Dennis and Tapia [21]. 

In contrast to the above single TRS, the TTRS can have a nonzero 
duality gap [86, 118, 119, 120], which is closely related to quadratic 
theorems of the alternative [25]. In addition, if the constraints are not 
convex, then the primal may not be attained (see [68]). 

As mentioned above, Martinez [70] shows that the TRS can have at 
most one local optimum that is nonglobal, the Lagrangian there having 
one negative eigenvalue. Therefore, if we have such a case and add 
another ball constraint that contains the local, nonglobal, optimum in 
its interior, so that this point becomes the global optimum, then we 
obtain a TTRS that does not have a zero duality gap due to the negative 
eigenvalue. It is uncertain what additional constraints are successful at 
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closing this duality gap. In fact, it is still an open problem whether 
TTRS is an NP-hard or a polynomial time problem. 

General Q^P. The general, possibly nonconvex, Q^P has many 
applications in modeling and approximation theory, see e.g. the applica- 
tions to SQP methods in [58]. Examples of approximations to Q^P also 
appear in [32]. 

The Lagrangian relaxation of a Q^P is equivalent to the SDP relax- 
ation, and is sometimes called the Shor relaxation [103]. The Lagrangian 
relaxation can be written as an SDP if one takes into the account the 
hidden semidefinite constraint, the inner quadratic objective function 
being bounded below only if the Hessian is positive semidefinite. The 
SDP relaxation is then the Lagrangian dual of this semidefinite program. 
It can also be obtained directly by lifting the problem into matrix space 
using the identity x^Qx = Trace x^Qx — Trace Qxx^ ^ and relaxing xx"^ 
to a semidefinite matrix X. 

The geometry of the original feasible set of Q^P can be related to 
the feasible set of the SDP relaxation. The connection is through valid 
quadratic inequalities^ i.e. nonnegative (convex) combinations of the 
quadratic constraints; see [33, 52] and our Section 2.2. 

Orthogonally Constrained Programs with Zero Duality Gaps. 

Let J\4m,n denote the space of m x n real matrices. We now follow the 
approach in [6, 5, 4] and consider the orthonormal type constraint 

X'^X = /, X 6 Mm,n, 

sometimes known as the Stiefel manifold [28], and the trust region type 
constraint 

X^X < I, X e Mm,n- 

Applications and algorithms for optimization on orthonormal sets of ma- 
trices are discussed in [28]. In this section we will show that, for m = n, 
strong duality holds for a certain (nonconvex) quadratic program defined 
over orthonormal matrices. Because of the similarity of the orthonor- 
mality constraint to the norm constraint x^x = 1, the given results can 
be viewed as a matrix generalization of the strong duality property (27) 
for the Rayleigh Quotient problem. 

Let A and Bhenxn symmetric matrices, and consider the orthonor- 
mally constrained homogeneous Q^P 



(QQPo) •= iiiin Trace 

s.t. XX^ - /. 



( 30 ) 
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This problem can be solved exactly using Lagrange multipliers [43], or 
using the classical Hoffman- Wielandt inequality [14], the solution being 
as follows. 

Proposition 2.1 Let the orthogonal diagonalizations of A and B be 
A — VTiV^ and B — UAU'^, respectively, where the eigenvalues of H 
and A are in nonincreasing and nondecreasing order, respectively. Then 
the optimal value of QQPq is = Trace SA, and the optimal matrix 
X is the product VU^ , which is obtained from the orthogonal matrices 
of the diagonalizations. ■ 



The Lagrangian dual of QQPq is 

max min Trace - Trace S{XX^ - I). (31) 

S=S'^ X 

There can be a nonzero duality gap for this Lagrangian dual, however, 
as shown by example in [121]. The inner minimization of problem (31) 
is an unconstrained quadratic minimization in the elements of X, with 
hidden constraint on the Hessian 

B^A-i®syo. 

On the other hand, the first order stationarity conditions at an optimum 
X, for the original problem QQPq, are equivalent to AXB = SX or, 
by orthogonality, AXBX^ = 5, which yields not only mutual diagonal- 
izability of A and XBX^ but also the characterization of the optimum. 
One can easily construct examples with a duality gap, caused by a con- 
flict between the semidefinite condition and stationarity. In order to 
close the duality gap, we need more constraints on X. 

Note that in QQPq the constraints XX^ = I and X^X — I are 
equivalent. Adding the redundant constraints X^X = /, we arrive at 

(QQPoo) Trace 

s.t. XX^ = /, X^X - I. (32) 

Using symmetric matrices S and T to relax the constraints XX^ — I 
and X^X = /, respectively, we obtain the dual problem 

(DQQPoo) ^ Trace 5 + Trace T 

s.t. (/ (g) 5) + (T ® /) ^ (B O A) 

S = T = T^, 



because, by homogeneity, the inner minimization over X is achieved at 

X = 0. 
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Theorem 2.3 Strong duality holds for QQPqo DQQPqo; 

/jL^ = gP j and both primal and dual are attained, ■ 



A further relaxation of the above orthogonal relaxation is the trust 
region relaxation, studied in [49], 

(QAPT) i^QAPT min Trace 

s.t. AA^ ^ I. (33) 

The constraints are convex with respect to the Lowner partial order, i.e. 
the partial order induced by the cone of positive semidefinite matrices. 
This problem is visually similar to the TRS problem discussed above, and 
we hope that methods for its solution will be useful. Therefore we would 
like to find a characterization of optimality. The set {A : AA^ ■< /} 
is studied separately in [80, 30], and is useful in eigenvalue variational 
principles. 

It is shown in [5] that the following generalization of the Hoffman- 
Wielandt inequality holds. 

Theorem 2.4 Let V^AV = S and U^BU = A be the orthogonal di- 
agonalizations of A and B, respectively, and let their eigenvalues be in 
nonincreasing order, say cf\ > (T2 > ' ‘ > cTn (^"^d Ai > A 2 > * • • > 
Then, for any XX^ ■< I, we have the inequality 

n n 

min{Ai(7n-z+i, 0} < Trace AA5A^ < max{Aj(7i, 0}. 

z=l i=l 

The upper bound is attained for X = FDiag(e)?7^, where si — I if 
o'iXi > 0, and si = 0 otherwise. The lower bound is attained for X = 
V Dmg {e) JU^ , where Si = 1 if aiXn+i-i ^ 0 and Si = 0 otherwise. 
Further, J is the permutation matrix (e^^, • • • , ei), where ei is the 

i-th coordinate vector. ■ 



For a scalar let [^]~ := min{0,<^}. The lower bound in the above 
Theorem 2.4 establishes • Since the theorem 

also provides the feaisible point of attainment, i.e. an upper bound for the 
relaxation problem, the theorem can be proved by showing that the value 
pQAPT attained by a Lagrangian dual program. However, there 

can be a duality gap if we use the Lagrangian dual of the trust region 
type relaxation with the constraint (33). Note that AA^ and A^A have 
the same eigenvalues, which implies AA^ / if and only if A^A :< I. 
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Explicitly, using both sets of constraints as in [6], we write QAPT in the 
form 



(QAPTR) l^QAPT Trace 

s.t. xx'^ X /, x'^x ^ I. 

Next we apply Lagrangian relaxation to QAPTR, using the symmetric 
matrices 5^0 and T ^ 0, say, to relax the constraints XX^ :< I and 
X^ X -< /, respectively. This gives the dual problem 

(DQAPTR) iiqapt ^ I^qapt •— —Trace S — Trace T 

s.t. -(/ (g) 5) - (T (g) (S (8) A) 

5^0, T ^ 0. 

The following properties are proved in [5]. 

Theorem 2.5 Strong duality holds for QAPTR and DQAPTR^ i.e. 
IIQAPT ~ kf'QAPT^ primal and dual are attained. ■ 



3. A STRENGTHENED BOUND FOR MC 

From the results in Section 2.1, it appears that we might have the 
strongest possible tractable bound for the max-cut problem MC (or 
its equivalent quadratic model MCQ). Bounds can be strengthened, 
however, by adding redundant constraints. We now present a strategy 
(recipe) for constructing relaxations. 

Algorithm 3.1 (Recipe for SDP relaxations) 

• add redundant constraints 

• homogenize 

• take Lagrangian dual 

• use hidden semidefinite constraint to obtain SDP equivalent 

• take Lagrangian dual of SDP 

• check Slater's CQ — project if it fails 

• delete redundant constraints from final SDP 

We will use the above recipe to strengthen our bound for MC. There- 
fore we introduce the following notation for linear operators and adjoints. 
For 5 G the vector s = svec (5) G is formed (columnwise) from 
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5, ignoring the strictly lower triangular part of 5, so t(n) = n{n + l)/2. 
The inverse operator, which constructs S E S'^ from s G is 5 = 

sMat (s). The adjoint of svec is the operator hMat , where hMat (u) is 
like sMat (u), u G except that the off-diagonal elements are halved, 
in order to provide the adjoint equation 

svec {S)'^v = Trace 5 hMat (u), VS' G v E 

The adjoint of sMat is the operator dsvec(S), which works like svec, 
except that the off diagonal elements are multiplied by 2, in order to 
satisfy 

dsvec (S)^r» = Trace S sMat (u), VS G v e 

For notational convenience, we define the vectors 

sdiag (s) := diag (sMat (s)) G Vs G 

and 

vsMat (s) = vec (sMat (s)) G 3?^^, Vs G 
The adjoint of vsMat is then given by 

vsMat*(i;) = dsvec ((Mat (?;) + Mat (^’)^) /2) G G 

The following bound is motivated by the strong duality results of 
Section 2.3, and is studied in depth in [6]. We recall that the SDP 
bound (20) for MCQ arises from a lifting procedure that employs 

0 ^ X = xx^ and x^Qx = Trace QX. 

Discarding the rank one condition on X provides the tractable SDP 
bound, but it is not clear what constraints one can add to the P e prob- 
lem (21), in order to strengthen the Lagrangian relaxation. Linear com- 
binations of the constraints will not help since they are already included 
in the Lagrangian. Nor is it clear what linear constraints to add to the 
SDP relaxation (20). One can try including the so-called triangle in- 
equalities, which are standard in branch and bound methods for MC. 
This choice is sometimes very successful in practice [47], but one can- 
not guarantee an improvement [50]. Instead, we see that the matrices 
X = xx^ have the property 

= xx^xx^ = nX 

in the case = ±1, z = 1, . . . , n. Therefore the quadratic matrix model 

/i* := max Trace QX 

s.t. diag(X):=e (34) 

X^-nX = 0, 
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where X is a symmetric matrix, is equivalent to MCQ. A common di- 
agonalization of X and shows that the only eigenvalues of X are 0 
and n, while (34) shows Trace X = n. Therefore the rank of X is one 
and X ^ 0. 

To illustrate the recipe (Algorithm 3.1), we add an additional redun- 
dant constraint, X o X = where E is the matrix of ones, obtaining 
the program 

/i* = max Trace QX 

(MC2) (35) 

X^-nX ^ 0, 

which is equivalent to (34) and so to MC. To apply Lagrangian relax- 
ation efficiently, without losing information from the linear constraint, 
we replace diag(X) = e by the norm constraint ||diag(X) — e|p = 0. 
Then we introduce homogeneity by adding the variable yo and the con- 
straint 1 — ?/g = 0. (Because of the homogenization, if x and yo = 
are optimal, then we can just multiply both x and y by — 1 without 
changing the optimal value or feasibility.) Thus, because X = sMat (x) 
is a symmetric matrix, we get the equivalent program 

max yo Trace (QsMat (x)) 
x,yo 

s.t. sdiag (x)^sdiag (x) — 2yo e^sdiag (x) -I- n = 0 
sMat (x) o sMat (x) = E 
sMat (x)^ — nyo sMat (x) = 0 

1 - yo = 0- 

, we see that we have another max-cut prob- 
lem with the same objective function as MCQ, additional linear con- 
straints sdiag X = e (these nodes must be grouped together), and the 
extra nonlinear constraints sMat (x)^ — nyo sMat (x) == 0. 

As discussed in Remark 1.2, the Lagrange multipliers for symmetric 
matrix valued constraint functions are symmetric matrices. Therefore 
problem (36) has the Lagrangian dual 

jj* < jy* := min max yo Trace (QsMat(x)) 
w,T,S x,yQ — l 

-t- w (sdiag (x)^sdiag (x) — 2yo sdiag (x) -t- n) 

+ Trace T{E — sMat (x) o sMat (x)) 

+ Trace 5((sMat (x))^ — nyo sMat (x)), 

where w, T and 5 are the Lagrange multipliers, the matrices T and S 
being symmetric. We can move the constraint on yo into the Lagrangian 



In fact, letting 2 : 



yo 

X 
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without increasing the duality gap, which provides the form 
Un = min max yoTr^tce (QsMat (x)) 

w,T,S,t x,yo 

+ w (sdiag (a:)^"sdiag (x) — 2yo e^sdiag (a;) + n) 

+ Trace T(J5' — sMat (a;) o sMat (x)) 

+ Trace 5( (sMat (x))^ — n^osMat (x)) 4- t{l — yl). 



The inner maximization of the above relaxation is an unconstrained 
pure quadratic maximization, so the optimal value is infinity unless the 
Hessian is negative semidefinite (hidden constraint), in which case 
X = 0 and yo = 0 are optimal. Therefore we need to evaluate the 
Hessian of the Lagrangian to find the first SDP in the recipe. 

Using Trace (Q sMat (x)) == x^dsvec(Q), and adding a 2 for conve- 
nience, we find that the constant part (no Lagrange multipliers) of the 
Hessian is 

0 

^dsvec {Q) 



2Hr := 2 



^dsvec (Q)^ 
0 



We now specify and manipulate some of the other linear operators that 
appear in the Lagrangian, which facilitates the derivation of the Hessian 
and the adjoints. 

The matrix sdiag* sdiag G where sdiag* is the adjoint of sdiag , 

is diagonal and has the elements 



ef (sdiag* sdiag ) ej = sdiag (e^)^sdiag {ej) 

_ r 1 iii=j = t{k) (37) 

1 0 otherwise. 



Moreover, if T = ^ijUjEij^ where the matrices Eij G S'^ are the ele- 
mentary matrices , then linearity gives 

dsvec (T o sMat ) = ^ Uj dsvec {Eij o sMat ) . (38) 

a 



Also we have 

dsvec Diag diag sMat — sdiag* sdiag = Diag svec (7^), (39) 

where the first two terms can be equated to a matrix because they are 
linear operators from to and where In is the nxn unit matrix. 

Using these relations, we rewrite the quadratic forms in the Lagrangian 
as 

sdiag (x)^sdiag (x) = x^ (dsvec Diag diag sMat ) x, 

yo e^sdiag (x) = yo (dsvec a;, 
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TraceTfsMat (x) o sMat (a:)) = {dsvec (T o sMat (a;))} .... 

= x^Uec(rosMat))x, 

Trace 5(sMat (x))^ == Trace sMat {x) S sMat (rr) 

== x^dsvec (5sMat (x)) (42) 

= (dsvec S sMat ) x. 

Because S sMat {x) may not be symmetric in (42), we let dsvec {B) de- 
note dsvec + \B^^) if B is any unsymmetric n x n matrix. It is easy 
to find the Hessian of a quadratic form written as • x. Therefore, we 
can now write down the negative of the nonconstant part of the Hessian 
of the inner maximization of Splitting it into four linear operators 
with the factor 2, we find the formula 



2n{w,T, S,t) 



2Ui{w) + 27/2 (T) + 2Uz{S) + 27^4 (^) 
0 (dsvec In )^ 



2w 



+2 



+2 



(dsvec In) — sdiag* sdiag 

0 0 
0 dsvec (T o sMat ' 



0 



I dsvec (5) 



|dsvec {S)^ 
-dsvec S sMat 



2t 



After cancelling the 2, it follows that our calculation is the semidefinite 
program 



uX = min nw -I- Trace ET + Trace 05 + t 

(MCDSDP2) 

s.t. n{w,T,S,t) ^ He, 

By taking T to be sufficiently positive definite and t to be sufficiently 
large, we can guarantee Slater’s constraint qualification. Therefore there 
is no duality gap between this SDP and its dual, which is the following 
strengthened SDP relaxation of MC 



1/9 = max Trace JTcT 

^ yxo 

(MCPSDP2) s.t. HliY) = n, = E, 

ni(Y)=0, 'HliY) = l, 



where the superscripts * denote adjoint operators as usual. 

To find the explicit form of the SDP relaxation, we now derive the 
adjoint operators explicitly. We partition Y as 



Y = 



Yoo xj \ 

X Y J - 



(43) 
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From (41) and 

ej (dsvec (T o sMat ))ej — (dsvec (T) o Cj)^ 1 < i,j < t{n), 
we deduce 

Ul{Y) -sMatdiag {Y). 

Therefore, %20^) — ^ T-L\(y) — 1 are equivalent to 

diag (y) = e. (44) 

Also, T-L\{Y) is twice the sum of the elements in the first row of Y whose 
positions correspond to the diagonal elements of sMat (•) minus the sum 
of the elements in these positions in the diagonal of T, so the notation 
(43) provides 

T-i\{Y) ■= 2 (svec/yx)^:r — Trace Diag {sYecIn)Y. 

It follows from equation (44) that the constraint 'Hl{Y) = n requires 

^t{i) = 1, V i = 1, . . . , n. (45) 

Finally, we find 'H^(Y). Recall that the definition of Hs is made 
up of four blocks, with the bottom right block defined by minus the 
quadratic form (42). Thus this block can be seen in two ways: either as 
a quadratic form or as its t{n) x t{n) symmetric matrix representation 
—dsvec S sMat . We regard dsvec • sMat as the linear operator that 
maps any rixn symmetric matrix S to the t{n) x t{n) symmetric matrix 
dsvec S sMat . Further, we write vsMat* vec instead of dsvec (= sMat*), 
in order to avoid applying dsvec to unsymmetric matrices, as mentioned 
after (42). Therefore the definition of 7^3(5) gives 

( 7 ^ 3 ( 5 ), y) = n dsvec (5)^x — (vsMat* vec 5sMat , y) 

= nTrace 5 sMat (x) — (S', sMat Y vsMat* vec ) 

= (S, n sMat (x) — sMat y vsMat* vec ) , 

which provides the adjoint 

7 / 3 (y) = nsMat {x) — sMat y vsMat* vec . (46) 

We now have another linear operator that maps symmetric matrices into 
symmetric matrices. Specifically, sMat - vsMat* vec maps any t{n) x t{n) 
symmetric matrix Y to sMat Y vsMat * vec , which is construed as an n x n 
symmetric matrix by invoking the quadratic form 

t’^sMat y vsMat* vec v = sMat Y vsMat* vec ) 

= Tf-ace {sMatydsvec (w^)}. 
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Indeed, the A:, / element of this quadratic form, which becomes the k^l 
element of sMat yvsMat* vec , has the value 

e^sMat yvsMat *vec 6/ = ^ Trace {sMat Fdsvec + e/e^)}. 



The explicit form of 1-L^iY) is given in the sums of MCPSDP3 below, 
and also we have removed many of the redundant constraints. We see 
that the space of the variables of the new problem has increased to 
^ from S'^ in (20), and that there are now 2t{n) — 1 constraints. 
They come from (44), (45), and the upper triangular part of (46). The 
latter constraints can be related to the original condition sMat (x) — 
^sMat (x) sMat (x). As in the original SDP relaxation MCSDP (20), we 
still require the matrix of variables to have ones on the diagonal, but we 
now have additional constraints on the first row of Y . Specifically, the 
new problem has the form 



(MCPSDP3) 



max 

S.t. 



Trace HnY 



diag {Y) = e 

Yo,t{i) = 1, Vi = l,...,n 

— n {X]/c=rl Yt(^i-l)+k,t{j-l)+k 
+ ^t(k-l)+i,t(j-l)-^k 






j+l ^t{k-l)+i,t{k-l)+j 
V 1 < i < j < n 



}■ 



y >- 0. 



Remark 3.1 Since the first row of Y has some ojf -diagonal elements 
equal to 1, and since all the diagonal elements are 1, the Slater constraint 
qualification fails for this problem, i.e. we cannot have a positive definite 
feasible matrix. Therefore we can project this problem onto a face of the 
semidefinite cone, and then project into a smaller dimensional space to 
get a new further simplified problem. The final program has constraints 
that are linearly independent and that satisfy Slater ^s condition. Details 
are given in [3]. 

The constraint i43(y) = 0 is the key to proving the following useful 
result. It is remarkable because the nonlinear constraint that sMat (x) 
be positive semidefinite (and in fact feasible for MCSDP) was discarded 
in MC2. A proof is included to illustrate the usefulness of using adjoints. 

Lemma 3.1 Suppose that Y is feasible in MCPSDP3. Then its first 
row satisfies 

sMat (Vo,l:«(n)) ^ 0- 

and so is feasible in the original SDP relaxation MCSDP (20). 
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Proof. 



Let Y be feasible for MCPSDP3, and write 

I ' 



Y = 



Y 



The fact that T is a principal submatrix of Y implies F ^ 0. We have 
previously established that 



ni{Y) = nsMat {x) - sMat FvsMat* vec , 



where vsMat*vec is essentially sMat* except that it acts on possibly non- 
symmetric matrices. Therefore the constraint T-l^iY) = 0 is equivalent 
to 



sMat (x) = — sMat FvsMat* vec . 
n 

Thus sMat {x) is a congruence of the positive semidefinite matrix Y. 
The result follows. ■ 



The term — nX, from the added nonlinear constraint — nX = 0 
in the original max-cut problem (34) or (35), has the following interesting 
and useful properties in the SDP relaxation. 

Lemma 3.2 Suppose that both X and X are feasible for MCSDP, Then 

Trace - nX) - nX) > 0. (4 7) 

Suppose, in addition, that 

{X^ -nX)^0 and {X^ - nX) ^ 0 

holdj and that both X and X are in X, a face of V, with X G relint T. 
Then 

Trace - nX) (X^ - nX) > 0. (48) 



Proof. See [3]. ■ 

The above two lemmas can be used to show the strength of this new 
bound. Indeed, unless there is no gap between MCSDP and MC, the re- 
laxation MCPSDP3 always provides a strict improvement over MCSDP. 
The proof is given in [3] . 

Theorem 3.1 The optimal values satisfy 

1^2 < and u <2 = ly"" 1^2 — T* • 



(49) 
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Thus we see that this new bound is always strictly better than the 
previous one. Numerical results on small problems are presented in [3]. 
They consistently provide a significant improvement over the already 
strong original SDP bound. 

4 . CONCLUSION 

We have looked at several problems where strong relaxations exist. 
In each case we have shown that our theme holds; one cannot do better 
than the Lagrangian relaxation. In particular, this has led us to a recipe 
for finding strong relaxations for hard, discrete optimization problems 
and a strengthened relaxation for the max-cut problem. 
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Abstract The modern era of interior-point methods dates to 1984, when Kar- 
markar proposed his algorithm for linear programming. In the years 
since then, algorithms and software for linear programming have become 
quite sophisticated, while extensions to more general classes of problems, 
such as convex quadratic programming, semidefinite programming, and 
nonconvex and nonlinear problems, have reached varying levels of ma- 
turity. Interior-point methodology has been used as part of the solution 
strategy in many other optimization contexts as well, including ana- 
lytic center methods and column-generation algorithms for large linear 
programs. We review some core developments in the area. 

Keywords: optimization, interior-point methods 

1. INTRODUCTION 

Interior-point methods have been a topic of intense scrutiny by the 
optimization community during the past 15 years. Although methods 
of this type had been proposed in the 1950s, and investigated quite 
extensively during the 1960s [9], it was the announcement of an algo- 
rithm with intriguing complexity results and good practical performance 
by Karmarkar [20] that ushered in the modern era. This work placed 
interior-point methods at the top of the agenda for a large and diverse 
body of researchers and led to a series of remarkable advances in various 
areas of convex optimization. 

Today, interior-point methods for linear programming have become 
quite mature both in theory and in practice, and several high-quality 
codes are available. For the rival algorithm, the simplex method, the 
sudden appearance of credible competition spurred significant improve- 
ments in the software, resulting in a quantum advance in the state of 
the art in computational linear programming since 1988. 

M.J.D. Powell and S. Scholtes (Eds.), System Modelling and Optimization: Methods, Theory and 
Applications. © 2000 IFIP. Published by Kluwer Academic Publishers. All rights reserved. 
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The theory of interior-point methods in other areas of convex pro- 
gramming and monotone complementarity also appears to have reached 
a fairly advanced stage. The computational picture is less clear than 
for general linear programming, however. In some areas, such as semi- 
definite programming, there is no apparent alternative algorithm whose 
practical efficiency is comparable to the interior-point approach, while in 
others, such as quadratic programming, active-set methods (which de- 
scend from the simplex method for linear programming) provide strong 
competition. 

Investigation of the use of interior-point methods in various areas of 
nonconvex optimization, including discrete optimization, is in a much 
less advanced stage. The eventual prospects are still unclear, though 
early results in some areas (for example, nonlinear programming) show 
distinct promise. A thread common to many approaches is the use of 
interior-point methods to find inexact solutions of convex subproblems 
that arise during the course of the larger algorithm. 

We start in Section 2 by outlining the state of the art of interior-point 
methods in linear programming, discussing the pedigree of the most 
important algorithms, computational issues, and customization of the 
approach to structured problems. In Section 3, we discuss the straight- 
forward extensions to quadratic programming and linear complementar- 
ity, and compare the resulting algorithms with active-set methods. The 
extension to semidefinite programming is discussed in Section 4, along 
with the theoretical work on self-concordant functionals and self-scaled 
cones that forms the underpinning of some of this work. Finally, we 
present some conclusions in Section 5. 

A great deal of literature is available to the reader interested in delving 
further into this area. A number of recent books (Ye [44], Roos, Vial, and 
Terlaky [35], Wright [42]) give overviews of the area, from first principles 
to new results and practical considerations. Theoretical background 
on self-concordant functionals and related developments is described by 
Nesterov and Nemirovskii [28] and Renegar [34]. Technical reports from 
the past five years can be obtained from the Interior-Point Methods 
Online Web site at www.mcs.anl.gov/otc/InteriorPoint. 

For lack of space, we have omitted discussion of many interesting 
areas in which interior-point approaches are making an impact. Convex 
programming problems of the form 

min f{x) s.t. gi(x)<0, z = I,2, ...,m, 

X 

where / and gi^ i = 1,2, ... ,m, are convex functions, can be solved by 
extensions of the primal-dual approach of Section 3; see, for example, 
Ralph and Wright [32]. Interestingly, it is possible to prove superlinear 
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convergence of the resulting algorithms without assuming linear inde- 
pendence of the active constraints at the solution. This observation 
prompted recent work on improving the convergence properties of other 
algorithms, notably sequential quadratic programming. A number of 
researchers have used interior-point methods in algorithms for combina- 
torial and integer programming problems. (In some cases, the interior- 
point method is used to find an inexact solution of related problems in 
which the integrality constraints are relaxed.) Recent computational re- 
sults are presented in Mitchell [24], and a comprehensive survey is given 
by Mitchell, Pardalos, and Resende [26]. In decomposition methods for 
large linear and convex problems, such as Dantzig- Wolfe/column gen- 
eration and Benders’ decomposition, interior-point methods have been 
used to find inexact solutions of the large master problems, or to ap- 
proximately solve analytic center subproblems to generate test points. 
Approaches such as these are described by Gondzio and Sarkissian [16], 
Gondzio and Kouwenverg [15], and in the survey paper of Goffin and 
Vial [13]. Additionally, application of interior-point methodology to 
nonconvex nonlinear programming has occupied many researchers for 
some time now. The methods that have been proposed to date contain 
many ingredients, including primal-dual steps, barrier and merit func- 
tions, and scaled trust regions. Recent work in this area includes the 
reports of Byrd, Hribar, and Nocedal [5], Conn et al. [7], Gay, Overton, 
and Wright [11], and Forsgren and Gill [10]. 

2. LINEAR PROGRAMMING 

We consider first the linear programming problem, which we state in 
standard form: 

min c^x s.t. Ax = b. x >0, (1) 

X ‘ 

where x G IR’^ and A G IR^^^. We assume that this problem has a strict 
interior, that is, the set 



= {x \ Ax = X > 0} 

is nonempty, and that the objective function is bounded below on the set 
of feasible points. Under these assumptions, (1) has a (not necessarily 
unique) solution. 

By using a logarithmic barrier function to account for the bounds 
X > 0, we obtain the parametrized optimization problem 

1 ^ 

min f{x;jl) = -c^x ~y^\ogXi s.t. Ax — b, 



( 2 ) 
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where log denotes the natural logarithm, and /i > 0 denotes the barrier 
parameter. Because the logarithmic function requires its arguments to 
be positive, the solution x{fi) of (2) must belong to It is well known 
(see, for example, Wright [40, Theorem 5]) that for any sequence {fik} 
with jlk i 0 , all limit points of are solutions of ( 1 ). 

The traditional SUMT approach of Fiacco and McCormick [9] ac- 
counts for equality constraints by including a quadratic penalty term in 
the objective. When the constraints are linear, as in (1), it is simpler 
and more appropriate to handle them explicitly. By doing so, we devise 
a primal barrier algorithm in which a projected Newton method is used 
to find an approximate solution of ( 2 ) for a certain value of /i, and then 
ft is decreased. The projected Newton step Ax from a point x satisfies 
the following system: 

Ax _ c — jlX~^e . . 

A 0 A+ ““ Ax -6 ’ 



where X — diag(xi, X 2 , . . . , Xn) and e = (1, 1, . . . , 1)^, Note that 

Vlj{x- A) = X-2, v^/(x; A) - (1/A)c - X-'e, 

SO that the equations (3) are the same as those that arise from a sequen- 
tial quadratic programming algorithm applied to ( 2 ), modulo the scaling 
by jl in the first line of (3). A line search can be performed along Ax to 
find a new iterate x + aAx, where a > 0 is the step length. 

The prototype primal barrier algorithm can be specified as follows: 

primal barrier algorithm 

Given x^ G and /Iq > 0; 

Set h i — O 5 

repeat 

Obtain x^'^^ by performing one or more Newton steps (3), 
starting at x = x^, and fixing ft = fik] 

Choose fik-j-i ^ P'k)') ^ — k 1] 

until some termination test is satisfied. 

A short-step version of this algorithm takes a single Newton step at 
each iteration, with step length a = 1 , and sets 



i^k 




( 4 ) 



It is known (see, for instance, Renegar [34, Section 2.4]) that, if the 
feasible region of ( 1 ) is bounded, and x^ is sufficiently close to x(/io) in 
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a certain sense, then we obtain a point whose objective value (P^x^ is 
within £ of the optimal value after 



O 




iterations, 



( 5 ) 



where the constant factor disguised by the O(-) depends on the properties 
of (1) but is independent of n and e. 

The rate of decrease of fi in short-step methods is too slow to al- 
low good practical behavior, so long-step variants were proposed that 
decreased fi more rapidly, while possibly taking more than one New- 
ton step for each and also using a line search. Although long-step 
algorithms have better practical behavior, the complexity estimates as- 
sociated with them typically are no better than the estimate (5) for the 
short-step approach; see Renegar [34, Section 2.4] and Gonzaga [17]. In 
fact, a recurring theme of worst-case complexity estimates for linear pro- 
gramming algorithms is that no useful relationship exists between the 
estimate and the practical behavior of the algorithm. 

Better practical algorithms are obtained from the primal-dual frame- 
work. These methods recognize the importance of the path of solutions 
x(/i) to (2) in the design of algorithms, but differ from the approach 
above in that they treat the dual variables explicitly in the problem, 
rather than as adjuncts to the calculation of the primal iterates. 

The dual problem for (1) is 

max X s.t. A^X + 5 c, 5 > 0, (6) 

(A,s) 

where s E IR^ and A G IR"^. The optimality conditions for x* to be a 
solution of (1) and (A*, 5*) to be a solution of (6) are that (x,A,5) = 
(x*,A*, 5*) satisfies 

Ax = 5, (7a) 

A^X + 5 = c, (7b) 

XSe = 0, (7c) 

{x,s) > 0, (7d) 

where X = diag(xi, X 2 , . . . , Xn) and S = diag(5i, 52, . . . , 5^), and where 
(x, 5 ) > 0 indicates that all the components of x and 5 are nonnegative. 
Primal-dual methods solve (1) and (6) simultaneously by generating a 
sequence of iterates (x^, A^,5^) that in the limit satisfies the conditions 
(7). As mentioned above, the central path defined by the following per- 
turbed variant of (7) plays an imporant role in algorithm design: 



Ax 



(8a) 
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(8b) 

(8c) 

(8d) 



A^X + 5 = c, 

XSe — fie^ 

{x,s) > 0, 

where /i > 0 parametrizes the path. Note that these conditions are 
simply the optimality conditions for the problem (2): If \{jl)^s{jl)) 

satisfies (8), then x{jl) is a solution of (2). We have from (8c) that a key 
feature of the central path is that 

XiSi = /i, for all 2 = 1, 2, . . . , n, (9) 

that is, the pairwise products XiSi are identical for all i. 

In primal-dual algorithms, steps are generated by fixing jl at some 
appropriate value (discussed below) and applying a perturbed Newton 
method to the three equalities (8a), (8b), and (8c), which form a non- 
linear system in which the number of equations equals the number of 
unknowns. We constrain all iterates (a:^, A^,5^) to have (x^,5^) > 0, so 
that the matrices X and S remain positive diagonal throughout, ensur- 
ing that the perturbed Newton steps are well defined. Supposing that 
we are at a point (x. A, s) with (x, 5 ) > 0 and the feasibility conditions 
Ax — b and A'^X-j-s = c are satisfied, the primal-dual step (Ax, AA, As) 
is obtained from following system: 

■ 0 ^ 0 1 r AA 1 r 0 

0 I Ax - - 0 , (10) 

_ 0 5 X J [ As J [ XSe -apie + r _ 

where pi = x^sjn^ a G [0,1], and r is a perturbation term, possibly 
chosen to incorporate higher-order information about the system (8), or 
additional terms to improve proximity to the central path. If the per- 
turbation r were not present, (10) would simply be the Newton system 
for (8a), (8b), and (8c), where the value of /i is fixed at cr/i. 

Using the general step (10), we can state the basic framework for 
primal-dual methods as follows: 

primal-dual algorithm 

Given (x^, A^,s^) with (x^,s^) > 0; 

Set k ^ 0 and yuo = (x^)^s^/n; 

repeat 

Choose ak and r^; 

Solve (10) with ji — fik-, a = (Jk r = 
to obtain (Ax^, AA^, As^); 



Set 
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^ {x\ A^ s*=) + afe(A:r^ AA^ A/), 

choosing ak G (0, 1] to ensure that > 0; 

Set fj,k+i <— /n] k <— k + 1-, 

until some termination test is satisfied. 

The various algorithms that use this framework differ in the way that 
they choose the starting point, the centering parameter cr^;, the pertur- 
bation vector r^, and the step The simplest algorithm — a short- 
step path-following method similar to the primal algorithm described 
above — sets 

u 0 4 

= 0 , ak = l- -j=, ak = 1 , 

y/n 

and, for suitable choice of starting point, achieves convergence to a fea- 
sible point (x, A, s) with x'^s/n < £ for a given e in 

0^\/n log iterations. (11) 

Note the similarity of both the algorithm and its complexity estimate 
to the corresponding primal algorithm. As in that case, algorithms with 
better practical performance, but not necessarily better complexity es- 
timates, can be obtained through more aggressive, adaptive choices of 
the centering parameter (that is, (Jk closer to zero). They use a line 
search to maintain proximity to the central path. The proximity re- 
quirement dictates, implicitly or explicitly, that while the condition 

(9) may be violated, the pairwise products must not be too differ- 
ent from each other. Many such algorithms, including path-following, 
potential-reduction, and predictor-corrector algorithms, are discussed in 
Wright [42]. 

Most interior-point software for linear programming is baised on Meh- 
rotra’s predictor-corrector algorithm [22], often with the higher-order 
enhancements described by Gondzio [14]. This approach uses an adap- 
tive choice of cr^, selected by first solving for the pure Newton step (i.e., 
setting r = 0 and a = 0 in (10). If this step makes good progress in 
reducing fi^ we choose cr^ small so that the step actually taken is quite 
close to this pure Newton step. Otherwise, we enforce more centering 
and calculate a conservative direction by setting ak closer to 1. The 
perturbation vector is chosen to improve the similarity between the 
system (10) and the original system (8) that it approximates. Gondzio’s 
technique further enhances r^ by performing further solves of the system 

(10) with a variety of right-hand sides, where each solve reuses the fac- 
torization of the matrix, and is therefore not too expensive to perform. 
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To turn this basic algorithmic approach into a useful piece of soft- 
ware, we must address many issues. These include problem formulation, 
presolving to reduce the problem size, choice of the step length, lin- 
ear algebra techniques for solving (10), and user interfaces and input 
formats. 

Possibly the most interesting issues are associated with the linear 
algebra. Most codes deal with a partially eliminated form of (10), either 
eliminating As to obtain 



0 


A 1 


■ AA ■ 




0 


. 


1 

7 

1 


Ax 




_ -X-\XSe-a^ie + r) _ 



or eliminating both As and Ax to obtain a system of the form 

A{S-^X)A^AX = t, (13) 

to which a sparse Cholesky algorithm is applied. A modified version of 
the latter form is used when dense columns are present in A. These 
columns may be treated as a low-rank update and handled via the 
Sherman-Morrison- Woodbury formula or, equivalently, via a Schur com- 
plement strategy applied to a system intermediate between (12) and 
(13). In many problems, the matrix in (13) becomes increasingly ill- 
conditioned as the iterates progress, eventually causing the Cholesky 
process to break down as negative pivot elements are encountered. A 
number of simple (and in some cases counterintuitive) patches have been 
proposed for overcoming this difficulty while still producing useful ap- 
proximate solutions of (13) efficiently; see, for example, Andersen [2] and 
Wright [43]. 

Despite many attempts, iterative solvers have not shown much pro- 
mise as a means to solve (13), at least for general linear programs. A 
possible reason is that, besides its poor conditioning, the matrix lacks 
the regular spectral properties of matrices obtained from discretizations 
of continuous operators. Some codes do, however, use preconditioned 
conjugate gradient as an alternative to iterative refinement for improving 
the accuracy, when the direct approach for solving (13) fails to produce 
a solution of sufficient accuracy. The preconditioner used in this case is 
simply the computed factorization of the matrix A{S~^ X)A'^ . 

A number of interior-point linear programming codes are now avail- 
able, both commercially and free of charge. Information can be obtained 
from the World-Wide Web via the URL mentioned earlier. It is difficult 
to make blanket statements about the relative efficiency of interior-point 
and simplex methods for linear programming, as improvements to the 
implementations of both techniques continue to be made. Interior-point 
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methods tend to be faster on large problems and can better exploit mul- 
tiprocessor platforms, because the expensive operations such as Cholesky 
factorization of (13) can be parallelized to some extent. They are not 
able to exploit “warm start” information — a good prior estimate of the 
solution, for instance — to the same extent as simplex methods. For this 
reason, they are not well suited for use in contexts such as branch-and- 
bound or branch-and-cut algorithms for integer programming, which 
solve many closely related linear programs. 

Several researchers have devised special interior-point algorithms for 
special cases of (1) that exploit the special properties of these cases in 
solving the linear systems at each iteration. For network flow problems, 
Mehrotra and Wang consider preconditioned conjugate-gradient meth- 
ods for solving (13), in which the preconditioner is built from a spanning 
tree for the underlying network (see Mehrotra and Wang [23]). For mul- 
ticommodity flow problems, Castro [6] describes an algorithm for solving 
a version of (13) in which the block-diagonal part of the matrix is used to 
eliminate many of the variables, and a preconditioned conjugate- gradient 
method is applied to the remaining Schur complement. Techniques for 
stochastic programming (two-stage linear problems with recourse) are 
described by Birge and Qi [4] and Birge and Louveaux [3, Section 5.6]. 

3. SIMPLE EXTENSIONS OF THE 
PRIMAL-DUAL APPROACH 

The primal-dual algorithms of the preceding section are readily ex- 
tended to convex quadratic programming (QP) and monotone linear 
complementarity (LCP), both classes being generalizations of linear pro- 
gramming. Indeed, many of the convergence and complexity properties 
of primal-dual algorithms were first elucidated in the literature with 
regard to monotone LCP. 

We state the convex QP as 

min c^x -f hx^Qx s.t. Ax = 6 , a; > 0, (14) 

where Q is a positive semidefinite matrix. The monotone LCP is defined 
by square matrices M and N and a vector g, where M and N satisfy a 
monotonicity property: all vectors y and z that satisfy My + Nz = 0 
have y"^ z > 0. This problem requires us to identify vectors y and 2 : such 
that 

My + Nz = q, {y,z)>0, y^z = 0. (15) 

With some transformations, we can express the optimality conditions 
(7) for linear programming, and also the optimality conditions for (14), 
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as a monotone LCP. Other problems fit under the LCP umbrella as well, 
including bimatrix games and equilibrium problems. The central path 
for this problem is defined by the following system, parametrized as in 
(8) by the positive scalar //: 



My + Nz 


= y, 


(16a) 


YZe 


= ye, 


(16b) 


(y>2) 


> 0, 


(16c) 



and a search direction from a point (y^z) satisfying (16a) and (16c) is 
obtained by solving a system of the form 

■ M N 
Z Y 

where = y^^z/n^ a E [0,1], and, as before, r is a perturbation term. 
The corresponding search direction system for the quadratic program 
(14) is identical to (10) except that the (2,2) block in the coefficient 
matrix is replaced by —Q. The primal-dual algorithmic framework and 
the many variations within this framework are identical to the case of 
linear programming with the minor difference that the step length should 
be the same for all variables. (In linear programming, different step 
lengths can be, and often are, taken for the primal variable x and the 
dual variables (A,s).) 

Complexity results are also similar to those obtained for the corre- 
sponding linear programming algorithm. For an appropriately chosen 
starting point {y^,z^) with /jq = {y^Y' z^ /n^ we obtain convergence to a 
point with ji < e m 



1 




0 


Az 




YZe — aye + r 



(17) 



O {n^ log — ^ 



iterations. 



where r = 1/2, 1, or 2, depending on the algorithm. Fast local con- 
vergence results typically require an additional strict complementarity 
assumption that is not necessary in the case of linear programming (see 
Monteiro and Wright [27]), although some authors have proposed su- 
perlinear algorithms that do not require this assumption. Algorithms 
of the latter type require accurate identification of the set of degener- 
ate indices before the fast convergence becomes effective. This property 
makes them of limited interest, since by the time the degenerate set has 
been identified, the problem is essentially solved. 

The LCP algorithms can, in fact, be extended to a wider class of 
problems involving so-called sufficient matrices. Instead of requiring M 
and N to satisfy the monotonicity property defined above, we require 




Developments in Interior- Point Methods 321 



that there exist a nonnegative constant n such that 

z > —An ^ yiZi^ for all y, 2 ; with My + Nz = 0. 

i\yiZi>{} 

The complexity estimate for interior-point methods applied to such prob- 
lems depends on the parameter /^; that is, the complexity is not polyno- 
mial on the whole class of sufficient matrices. 

Primal-dual methods have been applied to many practical applica- 
tions of (14) and (15). For example, an application to Markowitz’s 
formulation of the portfolio optimization problem is described by Take- 
hara [36]; applications to optimal control and model predictive control 
are described by Wright [41] and Rao, Wright, and Rawlings [33]; an 
application to regression is described by Portnoy and Koenker [31]. 

The interior-point approach has a number of advantages over the 
active-set approach from a computational point of view. It is difficult 
for an active-set algorithm to exploit any structure inherent in both Q 
and A, without redesigning most of the complex operations that make 
up this algorithm (adding a constraint to the active set, deleting a con- 
straint, evaluating Lagrange multiplier estimates, calculating the search 
direction, and so on). In the interior-point approach, on the other hand, 
the only complex operation is the solution of the linear system (17) — and 
this operation is fairly straightforward by comparison with the opera- 
tions in an active-set method. Since the structure and dimension of the 
linear system remain the same at all iterations, the routines for solv- 
ing the linear systems can be designed to fully exploit the properties of 
the systems arising from each problem class. In fact, the algorithm can 
be implemented to high efficiency using an object-oriented approach, in 
which the programmer of each new problem class needs to supply only 
code for the factorization and solution of the systems (17), optimized for 
the structure of the new class, along with a number of simple operations 
such as inner-product calculations. Code that implements upper-level 
decisions (choice of parameter cr, vector r, steplength a) remains effi- 
cient across the gamut of applications of (15) and can simply be reused 
by all applications. 

We note, however, that active-set methods may still require much less 
execution time than interior-point methods in many contexts, especially 
when “warm start” information is available, and when the problem is 
generic enough that not much benefit is gained by exploiting its struc- 
ture. 

The extension of primal-dual algorithms from linear programming to 
convex quadratic programming is so straightforward that a number of 
the interior-point linear programming codes have recently been extended 
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to handle problems in the class (14) as well. In their linear algebra 
calculations, these codes treat both Q and A as general sparse matrices, 
and hence are efficient across a wide range of applications. By contrast, 
as noted in Gould and Toint [18, Section 4], implementations of active- 
set methods for (14) that are capable of handling even moderately sized 
problems have not been widely available. 

4. SEMIDEFINITE PROGRAMMING 

Here we discuss extensions of interior-point techniques to broad classes 
of problems that include semidefinite programming (SDP) and second- 
order cone programming. The SDP problem can be stated as 

min (7 • X, s.t. X ^ 0, • X = 6^, i = 1, 2, . . . , m, (18) 

where X, C, and i = 1,2, ...,m, are n x n symmetric matrices 
^ipnxn, ^ Q denotes the constraint that X be positive definite, and 
denotes the inner product P^Q = j PijQij. By further restricting 
X, C, and Ai all to be diagonal, we recover the linear programming 
problem (1). The class (18) has been studied intensively during the past 
seven years, in part because of its importance in applications to control 
systems and because many combinatorial problems have powerful SDP 
relaxations. The second-order cone programming problem is 

J2^=lCjxi+ Piti S.t. (19) 

BiXi + diti = b, \\xi\\ 2 <ti, i = 1,2, 

where each Xi is a vector of length > 1, each Bi is an mo x rii matrix, 
b and each di are vectors of length mo, and each ti is a scalar. Convex 
quadratically constrained quadratic programs can be posed in the form 
(19), along with sum-of-norms problems and many other applications 
(see Lobo et al. [21]). 

The key to extending efficient interior-point algorithms to these and 
other convex problems was provided by Nesterov and Nemirovskii [28]. 
The authors explored the properties of self-concordant functions. They 
showed that algorithms with polynomial complexity could be constructed 
by using barrier functions of this type for the inequality constraint, and 
then applying a projected Newton’s method to the resulting linearly 
constrained problem. 

Self-concordant functions are convex functions with the special prop- 
erty that their third derivative can be bounded by some expression in- 
volving their second derivative at each point in their domain. This prop- 
erty implies that the second derivative does not fluctuate too rapidly in 
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a relative sense, so that the function does not deviate too much from the 
second-order approximation on which Newton’s method is based. For 
this reason, we can expect Newton’s method to perform reasonably well 
on such a function. 

Given a finite-dimensional real vector space V, an open, nonempty 
convex set 5 C V, and a closed convex set T C V with nonempty 
interior, we have the following formal definition. 

Definition 1 The function F : S JR is self-concordant if it is convex 
and if the following inequality holds for all x E S and all h G V; 

\D^F{x)[h,h,h]\ <2{D^F{x)[h,h]f^^, (20) 

where D^F[hi^h 2 ^ • . . ->hk\ denotes the kth differential of F along the 
dzrectzons 5 • • • ^ * 

F is called strongly self- concordant if F{xi) 00 for all sequences 
Xi E S that converge to a point on the boundary of S. 

F is a '??-self-concordant barrier for T if it is a strongly self- concordant 
function for intT, and the param.eter 

sup F'{xf [F"{x)]~^ F'{x) (21) 

xeiniT 



is finite. 

Note that the exponent 3/2 on the right-hand side of (20) makes the 
condition independent of the scaling of h. It is shown by Nesterov and 
Nemirovskii [28, Corollary 2.3.3] that, if T 7 ^ V, then the parameter d 
is no smaller than 1 . 

It is easy to show that log-barrier function of Section 2 is an n-self- 
concordant barrier for the positive orthant lR!j: (that is, it satisfies ( 21 ) 
for d = n) ii we take 



V = ]R", S = JRl+, F{x) = -J2^ogXi, 

i=l 

where denotes the strictly positive orthant. Another interesting 

case is the second-order cone (or “ice-cream cone”), for which we have 

V = 5 = {(x,i) I ||a ;||2 < t}, = - log - ||a;||^) , 

( 22 ) 

where t eR and x E IR’^. In this case, F is a 2-self-concordant barrier 
and is appropriate for the inequality constraints in (19). A third im- 
portant case is the cone of positive semidefinite matrices, for which we 
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have 

V = n X n symmetric matrices 
S = n X n symmetric positive semidefinite matrices 
F{X) = -logdetX, 

where F is an n-self-concordant barrier. This barrier function can be 
used to model the constraint X ^ 0 in (18). 

Self-concordant barrier functions allow us to generalize the primal 
barrier method of Section 2 to problems of the form 

min(c, x) s.t. Ax = x G T, (23) 

where T is a closed convex set, (c, x) denotes a linear functional on the 
underlying vector space V, and A is a linear operator. Similarly to (2), 
we define the barrier subproblem to be 

min /(x; /i) — (c, x) + F(x) s.t. Ax = (24) 

X fJL 

where F{x) is a self-concordant barrier and > 0 is the barrier pa- 
rameter. Note that, by the Definition 1, f{x]ii) is also a strongly self- 
concordant function. The primal barrier algorithm for (23) based on 
(24) is as follows: 

primal barrier algorithm 

Given x^ G intT and /io > 0; 

Set k i — O5 

repeat 

Obtain x^~^^ G intT by performing one or more projected 
Newton steps for /(•; //^), starting at a: == 

Choose ^ (0,/iA:); k ^ k + 1] 
until some termination test is satisfied. 



Remarkably, the worst-case complexity of algorithms of this type de- 
pends on the parameter 1? associated with F, but not on any properties 
of the data that defines the problem instance. For example, we can de- 
fine a short-step method in which a single full Newton step is taken for 
each value of /c, and 11 is decreased according to 



= l^k 




Given a starting point with appropriate properties, we obtain an iterate 
x^ whose objective (c, x^) is within e of the optimum in 




iterations. 
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Long-step variants also are discussed by Nesterov and Nemirovskii [28]. 
The practical behavior of these methods does, of course, depend strongly 
on the properties of the particular problem instance. 

The primal-dual algorithms of Section 2 can also be extended to more 
general problems by means of the theory of self-scaled cones developed 
by Nesterov and Todd [29, 30]. The basic problem considered is the 
conic programming problem 

min(c, x) s.t. Ax = x G JT, (25) 

where K C IR^ is a closed convex cone, that is, a closed convex set for 
which X G K => tx e K for all nonnegative scalars t, and A denotes a 
linear operator from ]R^ to IR^. The dual cone for K is denoted by K* 
and defined as 



K* = {s\ ( 5 , x) > 0 for all x G K}, 

and we can write the dual instance of (25) as 

max(6. A) s.t. A*A + s = c, s G R"*, (26) 

where A* denotes the adjoint of A. The duality relationships between 
(25) and (26) are more complex than in linear programming, but if 
either problem has a feasible point that lies in the interior of K or RT*, 
respectively, the strong duality property holds. That is, if the optimal 
value of either (25) or (26) is finite, then both problems have finite 
optimal values, and these values are the same. 

RT is a self-scaled cone when its interior intRT is the domain of a 
self-concordant barrier function F with certain strong properties that 
allow us to define algorithms in which the primal and dual variables are 
treated in a perfectly symmetric fashion and play interchangeable roles. 
In particular, we have K* = K for such cones. The full elucidation of the 
properties of self-scaled cones is quite complicated, but it suffices to note 
here that the three cones mentioned above — the positive orthant IR!J:, the 
second-order cone (22), and the cone of positive semidefinite symmetric 
matrices — are the most interesting self-scaled cones. Their associated 
barrier functions are the logarithmic functions already mentioned. 

To build algorithms from the properties of self-scaled cones and their 
barrier functions, the Nesterov-Todd theory defines a scaling point for 
a given pair x G intR^, s G intR^ to be the unique point w such that 
H{w)x = 5 , where H{-) is the Hessian of the barrier function. In the 
case of linear programmin g, it i s easy to verify that w is the vector in 
IR^ whose elements are y/xjJ~Si. The Nesterov-Todd search directions 
are obtained as projected steepest descent directions for the primal and 
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dual barrier subproblems (that is, (24) and its dual counterpart), where 
a weighted inner product involving the matrix H {w) is used to define the 
projections onto the spaces defined by the linear constraints Ax = b and 
A* X s = c, respectively. The resulting directions satisfy the following 

linear system: 



■ 0 A O' 




■ AA ■ 




0 


A* 0 I 




Ax 


— — 


0 


0 H(w) I 




As 




s + a^iVF{x) 



where /i = {x^s)ld. (The correspondence with (10) is complete if we 
choose the perturbation term to be r = 0.) By choosing the starting 
point appropriately, and designing schemes for choosing the parameters 
G and step lengths to take along these directions, we obtain polynomial 
algorithms for this general setting. 

Primal-dual algorithms for (25), where X is a self-scaled cone, are 
also studied by Faybusovich [8], who takes the viewpoint of differential 
geometry and, in particular, uses a Jordan algebra framework. 

In the important case of semidefinite programming (18), the Nesterov- 
Todd framework is far from the only means for devising primal-dual 
methods. Many algorithms proposed before and since this framework 
was described do not fall under its umbrella, yet have strong theoretical 
properties and, in some cases, much better practical behavior. To outline 
a few of these methods, we write the dual of (18) as 



max X s.t. 
y,s 



m 

5^M* + 5 = C, 5^0, 

2=1 



(28) 



where S G and A G IR’^. Points on the central path for (18), 

(28) are defined by the following parametrized system: 



Ai^X - 


bii i — 1,2,.. 


. , m, (29a) 


TTl 

J2XiAi + S = 

A 1 




(29b) 


1 — i 

xs = 


fil, 


(29c) 


X ^ 0, 


syo, 


(29d) 


where as usual /i is the positive 


parameter. Unlike the corresponding 



equations (8) for linear programming, the system (29b), (29a), (29c) is 
not quite ^‘square,” since the variables reside in the space x IR"^ x 

^IRnxn range space of the equations is xIR^ xIR^^^. In 

particular, the product of two symmetric matrices (see (29c) is not nec- 
essarily symmetric. Before Newton’s method can be applied to (29b) — 
the fundamental operation in primal-dual algorithms — the domain and 
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range have to match. The different primal-dual algorithms differ in the 
ways that they reconcile the domain and range of these equations. 

The paper of Todd [37] is witness to the intensity of research in SDP 
interior-point methods: It describes twenty techniques for obtaining 
search directions for SDP. In many of these, the equation (29c) is re- 
placed by one whose range lies in That is, it is “symmetrized” 

and replaced with a mapping 



0(X,5) -0. (30) 

In deriving the step (AX, A A, A5), we approximate the mapping 0(X + 
AX, S + AS) with a linear approximation of the form 

0(X,5) +<?AX + XA5, (31) 

for certain operators S and T . We derive primal-dual methods by using 
(31) along with the linear equations (29b) and (29a). The heuristics 
associated with linear programming algorithms — Mehrotra and Gondzio 
corrections, step length determination, and so on — translate in a fairly 
straightforward way to this setting. The implementations are much more 
complex, however, since the linear problem to be solved at each iteration 
has a much more complicated structure than that of (10). It is noted 
in Haeberly, Nayakkankuppam, and Overton [19] that the benefits of 
higher-order corrections in the SDP context are even more pronounced 
than in linear programming, since the cost of factoring the coefficient 
matrix relative to the cost of solving for a different right-hand side is 
much greater for SDP. 

Examples of the symmetrizations (30) include the Monteiro-Zhang 
family, in which 

0(X, S) = ^ {P(XS)P-^ + p-^ixsfp'^) - fil, 

for some nonsingular P. The Alizadeh-Haeberly-Overton direction [1], 
which appears to be the most promising one from a practical point of 
view, is obtained by setting P = /, while the Nesterov-Todd direction 
is obtained from 



p2 ^ 5l/2(5l/2j^5l/2)-l/2^1/2^ 

A survey of the applications of SDP, ranging across eigenvalue opti- 
mization, structural optimization, control and systems theory, statis- 
tics, and combinatorial optimization, is given by Vandenberghe and 
Boyd [38]. The paper of Wolkowicz [39] in this volume discusses the the- 
ory and algorithms associated with applications to combinatorial prob- 
lems. The main use of SDP in combinatorial optimization is in finding 
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SDP relaxations (that is, problems of the form (18) that contain all the 
feasible points of the underlying combinatorial problem in their feasible 
sets) that yield high quality approximate solutions to the combinatorial 
problem. We illustrate the technique with possibly the most famous in- 
stance to date: the technique of Goemans and Williamson [12], which 
yields an approximate solution whose value is within 13% of optimality 
for the MAX CUT problem. 

In MAX CUT, we are presented with an undirected graph with N 
vertices whose edges have nonnegative weights Wij. The problem is to 
choose a subset S C {1,2,... ,W} of the vertices so that the sum of 
weights of the edges that cross from S to its complement is maximized. 
In other words, we aim to choose S to maximize the objective 

u;(5) Wij. 

iesj^s 

This problem can be restated as an integer quadratic program by in- 
troducing variables i = 1,2, ... ,7V, such that yi = 1 for i e S and 
2 /i = — 1 for 7 ^ 5. We then have 

max Wij {1 - yiVj) s.t. yiG{-l,l}, i = l,2,...,N. (32) 

^ i<j 

This problem is NP-complete. Goemans and Williamson replace the 
variables G IR by vectors Vi E IR^ and consider instead the problem 

max ^i?(l “ '^F'^?) s.t. IbJI = 1, 7 = 1, 2, . . . , TV. (33) 

Vi,V2,...,VN 

i<j 

This problem is a relaxation of (32), because any feasible point y for 

(32) corresponds to a feasible point 

Vi = ivi, 0, 0, , 0)^, i = 1,2,. . . ,N, 

for (33). The problem (33) can be formulated as an SDP by changing 
the variables V\^V 2 ^ • • • to a, matrix Y G such that 

Y = V^V^ where U = . . . ,t’Ar] . 

The constraints ||t’^|| = 1 can be expressed simply as Yu = 1, and, since 
Y = U^U, we must have Y semidefinite. The transformed version of 

(33) is then 

max -Yjj) s.t. Yu = 7 1, 2, . . . , TV, T b 0, 

i<j 
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which has the form (18) for appropriate definitions of C and Ai^ i = 
1,2, ... ,A^. We can recover V from Y by performing a Cholesky fac- 
torization. The final step of recovering an approximate solution to the 
original problem (32) is performed by choosing a random vector r G IR^, 
and setting 



Vi = 



{ 



1 , 

- 1 , 



if r^Vi > 0, 
if r'^Vi < 0. 



A fairly simple geometric argument shows that the expected value of the 
solution so obtained has objective value at least .87856 of the optimal 
solution to (32). 

Similar relaxations have been obtained for many other combinatorial 
problems, showing that is possible to find good approximate solutions 
to many NP-complete problems by using polynomial algorithms. Such 
relaxations are also useful if we seek exact solutions of the combinatorial 
problem by means of a branch-and-bound or branch-and-cut strategy. 
Relaxations can be solved at each node of the tree (in which some of 
the degrees of freedom are eliminated and some additional constraints 
are introduced) to obtain both a bound on the optimal solution and in 
some cases a candidate feasible solution for the original problem. Since 
the relaxations to be solved at adjacent nodes of the tree are similar, 
it is desirable to use solution information at one node to “warm start” 
the SDP algorithm at a child node. Mitchell [25] discusses an efficient 
strategy along these lines for the branch-and-cut strategy. 



5, CONCLUSIONS 

Interior-point methods remains an active and fruitful area of research, 
although the frenetic pace that has characterized the area has slowed in 
recent years. Linear programming codes have become mainstream and 
continue to undergo development, although they face continuing stiff 
competition from the simplex method. Semidefinite programming has 
proved to be an area of major impact. Applications to quadratic pro- 
gramming show considerable promise, because of the superior ability of 
the interior-point approach to exploit problem structure efficiently. The 
influence on nonlinear programming theory and practice has yet to be 
determined, even though substantial research has already been devoted 
to this topic. Use of the interior-point approach in decomposition meth- 
ods appears promising, though no rigorous comparative studies with 
alternative approaches have been performed. Applications to integer 
programming problems have been tried by a number of researchers, but 
the interior-point approach is hamstrung here by competition from the 
simplex method with its superior warm-start capabilities. 
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Stochastic Optimization 

Multiserver Retrial Queues: Optimization of the Retrial Rate 
J.R. Artalejo (Universidad Complutense de Madrid, Spain). 

Modeling and Estimation of Stochastic Volatility and Application to Option 
Pricing 

Shin-Ichi Aihara and Arunabha Bagchi (University of Twente, Netherlands). 

Symbolic Computation of Stochastic Sensitivities in Engineering Design 
H.P. Wynn and R.A. Bates (University of Warwick, UK). 

Scenario-Based Stochastic Programs: Changes of the Probability Distribution 
Jitka Dupacovd ( Charles University, Prague, Czech Republic ) and 
Werner Romisch. 

Stochastic Optimization of Catastrophic Risk Portfolios 

Y.M. Ermoliev (IIASA, Laxenburg, Austria), T.Y. Ermolieva, 

G.J. MacDonald and V.I. Norkin. 
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Optimization of Systems with Discrete Design Variables and Stochastic 
Responses by Sequential Linear Integer Programming 

S.J. Abspoel, L.F.P. Etman (Eindhoven University of Technology, 
Netherlands), A.J.G. Schoofs and J.E. Rooda. 

Solving a Sequence of Successively Discretized Multistage Stochastic Linear 
Programs 

Karl Frauendorfer and Gido Haarbriicker (University of St. Gallen, 
Switzerland). 

S-Estimation in Regression and Stochastic Optimization 
Keith Knight (University of Toronto, Canada). 

Approximate Solution of Stochastic Programs with Probability Functionals 
Riho Lepp (Tallinn Technical University, Estonia). 

Busy Period of an M/G/1 Retrial Queue: Optimization of Entropy Functionals 
M.J. Lopez-Herrero (Universidad Complutense de Madrid, Spain). 

Adaptive Control of Robots by Means of Stochastic Programming Techniques 
Kurt Marti (Universitdt der Bundeswehr, Munich, Germany). 

On the Numerical Solution of Jointly Chance Constrained Problems 
Janos Mayer (University of Zurich, Switzerland). 

A Stochastic Control Problem for Renewable Resource Exploitation 
Sara Pasquali (Universitd degli Studi di Padova, Italy). 

Concentrator Location Problem with Stochastic Demands 

Takayuki Shiina ( Central Research Institute of Electric Power Industry, 
Japan). 

Constrained Global Optimization Using Stochastic Differential Equations 
on Manifolds 

Annelie Stohr (TU Munich, Germany). 

Approximation of a Stochastic Integer Program with Applications in 
Telecommunications 

Shane Dye, Leen Stougie and Asgeir Tomasgard (SINTEF Industrial 
Management, Norway). 

On Newton’s Methods for Multi-Stage Stochastic Nonlinear Programming 
Gongyun Zhao (National University of Singapore). 
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Theory 

An Interior Linearization Method of Computing Fixed Points of Equilibrium 
Programming Problems 

Anatoly Antipin (Russian Academy of Sciences). 

Method of Chebyshev Points of Simplices in Convex Programming 

Valerian P. Bulatov (Russian Academy of Sciences) and Tatjana I. Belykh. 

On the Resolution of the Generalized Nonlinear Complementarity Problem 
R. Andreani, Ana Friedlander (State University of Campinas, Brazil) and 
Sandra A. Santos. 

Similarity Transformation Approach to Identifiability Results: An Algorithm 
and Some Comments 

Lilianne Denis-Vidal, Ghislaine Joly- Blanchard (University of Technology 
of Compiegne, France), Celine Noiret and Michel Petitot. 

On Some Relations Between Generalized Partial Derivatives and Convex 
Functions 

Peter Recht (University of Dortmund, Germany). 

A Rate Independent Evolution Variational Inequality with a Nonlinear 
Elliptic Part 

A.H. Siddiqi (King Fahd University of Petroleum and Minerals, Dhahran, 
Saudi Arabia) and M. Brokate. 

An Efficient Optimization Algorithm for Estimating the Norm of Inverse 
Lyapunov Operators 

Vasile Sima (Katholieke Universiteit Leuven, Belgium), Petko Petkov and 
Sabine Van Huff el. 

New Characterizations of Weak Sharp and Strict Local Minimizers in Nonlinear 
Programming 

Marcin Studniarski (University of Lodz, Poland) and Monika Studniarska. 

Proximal Methods for Variational Inequalities with Composed Monotone 
Operators 

A. Kaplan and R. Tichatschke (University of Trier, Germany). 

Application of Wavelet Transform to the Determination of Resonance 
Frequencies 

Mariusz Ziolko (University of Mining and Metallurgy, Krakow, Poland), 
Wojciech Batko and Tomasz Korbiel. 




